MS-Align-E

Usage

The released version is an alpha version. Please contact us if you find bugs in the program.

The input of MS-Align-E consists of a protein database file in the FASTA format and a spectral data file in the msalign format, which is similar to the MGF file format.

Convert raw files to msalign files

RAW files generated from Thermo Scientific™ Orbitrap™ mass spectrometers are used as an example to show how to convert raw files to msalign files. A raw data file, e.g., spectra.raw, is first converted into a centroided mzXML file using msconvert (msconvert is available only on Microsoft Windows™ systems).

msconvert.exe --filter "peakPicking true 1-" --mzXML --zlib spectra.raw

The argument "peakPicking true 1-" is used to generate a centroided, not profile, mzXML file, which is needed by the spectral deconvolution tool MS-Deconv.

The resulting centroided mzXML file spectra.mzXML is then converted into a msalign file using MS-Deconv.

java -jar msdeconv.jar spectra.mzXML

The resulting file spectral_msdeconv.msalign is used as the input spectral data file of MS-Align-E.

Run MS-Align-E in Linux

Copy your protein database (XXX.fasta) and the .msalign file to the directory.

MS-Align-E is very slow when the size of the protein database is large. If you know the target protein, please keep ONLY ONE protein sequence in the .fasta file.

Run the following command to search spectra against the protein database.

java -jar MsAlignEPipeline.jar database_file_name spectrum_file_name [options]

When the spectrum file name is spectrum.msalign, identification results are in the file spectrum.OUTPUT_TABLE.

Options

-h [ --help ]

Print the help message.

-a [ --activation ] <CID|HCD|ETD|FILE>

Activation type of tandem mass spectra. When FILE is used, the activation type information is given in the input spectral data file. Default value: FILE.

-c [ --cutoff ] <positive float value>

Cutoff value (Use this option with -t). Default value: 0.01.

-e [ --error ] <positive integer value>

Error tolerance for precursor and fragment masses in PPM. Default value: 15.

-m [ --modification ] <0|1|2>

Maximum number of unexpected post-translational modifications in a proteoform-spectrum-match. Default value: 0.

-s [ --search] <TARGET|TARGET+DECOY>

Searching against target or target+decoy (scrambled) protein database. Default value: TARGET.

-t [ --cutofftype] <EVALUE|FDR>

Use either EVALUE or FDR to filter out identified Protein-Spectrum-Matches. Default value EVALUE.