MS-Align+ Manual

1. About MS-Align+
    1.1. Supported data types
    1.2. MS-Align+ pipeline
    1.3. Performance of MS-Align+
2. Installation
    2.1. Downloading MS-Align+
3. Running MS-Align+
    3.1. Input
    3.2. Parameters
    3.3. Commands
    3.4. Output
4. Citation
5. Feedback and bug reports

1. About MS-Align+

MS-Align+ is a software tool for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. This manual helps you install and run MS-Align+.

1.1 Supported data types

The current version of MS-Align+ works with Thermo raw data.

1.2 MS-Align+ pipeline

The MS-Align+ pipeline for analyzing raw data includes three components: format conversion, deconvolution, and database search.

1.3 Performance of MS-Align+

See the paper Liu et al. 2012 .

2. Installation

MS-Align+ requires a computer with at least 12 GB memory, a 64-bit Linux or Microsoft Windows operating system, and Java Runtime Environment 7.0.

2.1 Downloading MS-Align+

Download a zipped file MS-Align+ 0.7.1.7143 and extract the zipped file to install MS-Align+.

To extract the zipped file on Microsoft Windows, right-click the file, click "Extract all", and follow the instructions.

To extract the zipped file on a Linux system, use the following command:


unzip MS-Align-0.7.1.7143.zip 

3. Running MS-Align+

3.1 MS-Align+ input

MS-Align+ needs three input files: a protein database file in FASTA format, a spectrum data file, and a configuration file. All the three files should be in directory INSTALL_DIR/msalign+/msinput/, where INSTALL_DIR is the directory that MS-Align+ has been installed.

The protein database file in FASTA format can be downloaded from SWISS-PROT or other protein databases.

The spectrum data file, in a format similar to MGF, can be obtained by using MS-Deconv. Please see the manual of MS-Deconv for details.

The name of the configuration file is input.properties. The user can edit this file to set parameters of MS-Align+.

3.2 Parameters

Here is a sample of the configuration file: input.properties.


databaseFileName=prot.fasta
spectrumFileName=spectra.msalign
activation=FILE
searchType=TARGET
cysteineProtection=C0
shiftNumber=2
errorTolerance=15
cutoffType=EVALUE
cutoff=0.01
doOneDaltonCorrection=false
doChargeCorrection=false
tableOutputFileName=result_table.txt
detailOutputFileName=result_detail.txt

Basic options

databaseFileName
    Specify the name of the protein database file.

spectrumFileName
    Specify the name of the spectrum file.

activation
    Specify the fragmentation type of tandem mass spectra. "activation" can be set as CID, HCD, ETD or FILE. When activation=FILE, MS-Align+ uses the fragmentation information provided in the spectrum file.

searchType
    The parameter searchType can be either TARGET or TARGET+DECOY. When searchType=TARGET+DECOY, a concatenated target+decoy database will be generated and false discovery rates will be calculated.

cysteineProtection
    The parameter cysteineProtection can be set as C0, C57 or C58. C0: no modification, C57: Carbamidoemetylation or C58:Carboxymethylation.

shiftNumber
    The maximum number of unexpected post-translational modifications in a protein-spectrum-match. The parameter can be set as 0, 1, or 2.

errorTolerance
    The error tolerance for precursor and fragment masses in PPM. The default value is 15.

cutoffType
    The type of the cutoff value for reporting identified protein-spectrum-matches. The parameter can be EVALUE or FDR. When cutoffType=FDR, searchType must be set as TARGET+DECOY.

cutoff
    The cutoff value for reporting identified protein-spectrum-matches. If cutoffType=FDR, the value is a cutoff with respect to FDRs. If cutoffType=EVALUE, the value is a cutoff with respect to E-values.

doOneDaltonCorrection
    The parameter can be set as true or false. If doOneDaltonCorrection=true, MS-Align+ will try to automatically correct +/-1 Da errors introduced in the deconvolution of precursor ions. This function has not been thoroughly tested.

doChargeCorrection
    The parameter can be set as true or false. If doChargeCorrection=true, MS-Align+ will try to automatically correct errors in the precursor masses introduced by deconvolution tools which may report an incorrect charge state of a precursor ion. This function has not been thoroughly tested.

tableOutputFileName
    The name of the output file in a tab delimited format.

detailOutputFileName
    The name of the output file in an xml format.

3.3 Commands

A desktop with at least 12 GB memory is required to run MS-Align+. To run MS-Align+, open a console in a Linux platform or a command line interpreter in Microsoft Windows, enter the directory where MS-Align+ is installed, and run the following commands:

Windows:

cd msalign+
java -Xmx12G -classpath jar\*; edu.ucsd.msalign.align.console.MsAlignPipeline .\

Linux:

cd msalign+
java -Xmx12G -classpath jar/*: edu.ucsd.msalign.align.console.MsAlignPipeline ./

3.4 Output

The resulting files of MS-Align+ are in three formats: a tab delimited text file, an xml file, and html files. The tax delimited text file and the xml file, whose names are specified in the configuration file, are in the directory msoutput. The files in directory html, including proteins.html, provide web pages for displaying identified protein-spectrum-matches.

4. Citation

If you use MS-Align+ in your research, please include Liu et al., 2012 in your reference list.

5. Feedback and bug reports

Your comments, bug reports, and suggestions are very welcome. They will help us to further improve MS-Align+.

If you have any troubles running MS-Align+, please email us liuxiaowencs@gmail.com or post your questions at the google group of MS-Align+.

6. New software tool

TopPIC (TOP-Down Mass Spectrometry Based Proteoform Identification and Characterization) is a new software tool for identification and characterization of proteoforms at the whole proteome level by top-down tandem mass spectra using database search. It uses several techniques, such indexes, spectral alignment, and a generation function method, to increase the speed, sensitivity, and accuracy. It also provides a web browser based user interface. You can download TopPIC here.