TopDIA is the first software tool for top-down proteoform identification using TD-DIA-MS data. TopDIA generates pseudo non-multiplexed MS/MS spectra from TD-DIA-MS data by integrating algorithms for detecting and matching proteoform and fragment features.
The input of TopDIA is mass spectrometry data files in the mzML format. Raw mass spectral data generated from various mass spectrometers can be converted to mzML files using msconvert.
TopDIA outputs LC-MS feature files for MS1 and MS/MS data with a file extension "csv", and two deconvoluted mass spectral data files in the msalign format with a file extension "msalign", which is similar to the MGF file format. In addition, TopDIA outputs pseudo-MS/MS spectral data in the msalign format with a file extension "pseudo_ms2.msalign"
For example, when the input file name is spectra.mzML, the output includes:
To run TopDIA, open a terminal window and run the following command.
topdia [options] spectrum-file-names
-h [ --help ]
Print the help message.
-a [ --activation ] <CID|ETD|HCD|MPD|UVPD|FILE>
Set the fragmentation method(s) of MS/MS spectra. When "FILE" is selected, the fragmentation methods of spectra are given in the input spectrum data file. Default value: FILE.
-c [ --max-charge ] <a positive integer>
Set the maximum charge state of precursor and fragment ions. The default value is 30.
-m [ --max-mass ] <a positive number>
Set the maximum monoisotopic mass of precursor and fragment ions. The default value is 70,000 Dalton.
-e [ --mz-error ] <a positive number>
Set the error tolerance of m/z values of spectral peaks. The default value is 0.02 m/z.
-r [ --ms-one-sn-ratio ] <a positive number>
Set the signal/noise ratio for MS1 spectra. The default value is 3.
-s [ --ms-two-sn-ratio ] <a positive number>
Set the signal/noise ratio for MS/MS spectra. The default value is 1.
-n [ --msdeconv ]
Use the MS-Deconv score (see paper) to rank isotopic envelopes. If -n is not selected, the default EnvCNN score (see paper) is used to rank isotopic envelopes.
-w [ --precursor-window ] <a positive number>
Set the precursor isolation window size. The default value is 4.0 m/z. When the input file contains the information of precursor windows, the parameter will be ignored.
-t [ --ms1-ecscore-cutoff ] <a positive number in [0, 1]>
Set the ECScore cutoff value for proteoform features. Default value is 0.
-T [ --ms2-ecscore-cutoff ] <a positive number in [0, 1]>
Set the ECScore cutoff value for fragment features. Default value is 0.
-b [ --ms1-min-scan-number ] <1|2|3>
The minimum number of MS1 scans in which a proteoform feature is detected. The default value is 2.
-B [ --ms2-min-scan-number ] <1|2|3>
The minimum number of MS2 scans in which a fragment feature is detected. The default value is 1.
-i [ --single-scan-noise ]
Use the peak intensity noise levels in single MS1 scans to filter out low intensity peaks in proteoform feature detection. The default method is to use the peak intensity noise level of the whole LC-MS map to filter out low intensity peaks.
-p [ --ms1-intensity-correlation-cutoff ] <a positive number in [0, 1]>
Set the MS1 seed envelope intensity correlation cutoff value for extracting proteoform features. The default value is 0.5
-P [ --ms2-intensity-correlation-cutoff ] <a positive number in [0, 1]>
Set the MS2 seed envelope intensity correlation cutoff value for extracting fragment features. The default value is 0.
-v [ --pseudo-cutoff ] <a positive number in [0, 1]>
Set the Pseudo Score cutoff value for generating pseudo-MS/MS spectrum. The default value is 0.55.
-V [ --pseudo-peak-number ]
The minimum number of peaks in pseudo-MS/MS spectrum. The default value is 25.
-d [ --disable-final-filtering ]
Skip the final filtering of isotopic envelopes in MS/MS spectra.
-u [ --thread-number ] <a positive integer>
Number of CPU threads used in spectral deconvolution. Default value: 1.
-g [ --skip-html-folder ]
Skip the generation of HTML files for visualization.
Deconvolute a centroid data file spectra.mzML and output feature (.csv) files, spectra_ms1.msalign, spectra_ms2.msalign, and spectra_pseudo_ms2.msalign
topdia spectra.mzML
Deconvolute a centroid data file spectra.mzML. In pseudo-MS/MS spectrum generation, each proteoform feature is required to be detected in at least one MS1 scan and the ECScore cutoff for proteoform features is set to 0.2.
topdia -t 0.2 -b 1 spectra.mzML
For example, to analyze DIA-MS data file covering m/z range [720, 800] for DIA-Test-Replicate-1, process mzML file with the following settings: the maximum charge state: 60, and use single scan noise intensity during proteoform and fragment feature extraction
topdia -c 60 -i 20231117_DIA_720_800_rep2.mzML
This will generate 20231117_DIA_720_800_rep2_pseudo_ms2.msalign file which will be used by TopPIC for proteoform identification using the following parameters: no TopFD feature files, use a shuffled decoy protein database to estimate spectrum and proteoform level FDRs, a text file containing the information of varaible PTMs and an E. coli protein database.
toppic -x -d -t FDR -T FDR -b var_mods.txt Ecoli.fasta 20231117_DIA_720_800_rep2_pseudo_ms2.msalign