Create the folders below for software packages and data sets used in this tutorial.
toppic_tutorial
on the C:
drive of your system.
toppic
in the folder C:\toppic_tutorial\
for
the software TopPIC suite.
tutorial_1
in the folder C:\toppic_tutorial\
.
tutorial_2
in the folder C:\toppic_tutorial\
.
tutorial_3
in the folder C:\toppic_tutorial\
.
The resulting folder structure is shown in the screenshot below.
Msconvert is a software tool in ProteoWizard that converts raw files into various spectrum file formats.
Follow the steps below to download ProteoWizard:
C:\toppic_tutorial\toppic\
.C:\toppic_tutorial\toppic\
.
In the MS experiment, the protein extract of S. typhimurium was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was first separated by gas-phase fractionation, resulting in 7 fractions. Each fraction was separated by an HPLC system coupled with an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). MS and MS/MS spectra were collected at a resolution of 60,000 and 30,000, respectively. In this tutorial, we use only the data files of two fractions (st_1.raw and st_2.raw).
Click here
to download the data set, save it in the folder C:\toppic_tutorial\tutorial_1\
, and unzip it in the same folder.
A S. typhimurium proteome database of 1,799 proteins was downloaded from the UniProt database.
Click here
to download the protein database and save it in the folder
C:\toppic_tutorial\tutorial_1\
.
The folder C:\toppic_tutorial\tutorial_1\
is shown in the screenshot below.
We use TopIndex to generate index files from the protein database. They will speed up database search of TopPIC and TopMG. While TopIndex supports multithreading, users with a spinning hard disk would experience faster speed when using only one thread instead of multple threads.
topindex_gui.exe
in the folder
C:\toppic_tutorial\toppic
.c:\toppic_tutorial\tutorial_1\uniprot-st.fasta
.C57
as the fixed modification. Decoy database
.
The screenshot of topindex_gui
is shown below.
TopIndex generates a folder
C:\toppic_tutorial\tutorial_1\uniprot-st.fasta_idx
containing index files.
In the analysis, C57 is selected as the fixed modification because proteins were reduced with dithiothreitol and alkylated with iodoacetamide before the MS experiment. When proteins are not reduced, no fixed modification should be selected.
We use MSConvertGUI to convert the raw files st_1.raw and st_2.raw to mzML files.
c:\toppic_tutorial\tutorial_1\st_1.raw
and c:\toppic_tutorial\tutorial_1\st_2.raw
as input
files.The screenshot of MSConvertGUI is shown below.
In the above file format conversion, the peak picking filter (step 3) is used to generate centroid, not profile, mzML data files, which are required by the spectral deconvolution tool TopFD.
The resulting mzML files are
C:\toppic_tutorial\tutorial_1\st_1.mzMLand
C:\toppic_tutorial\tutorial_1\st_2.mzMLThe sizes of the two files are about 41 MB and 47 MB, respectively. They can be downloaded here. The running time for the file format conversion is less than one minute.
We use topfd_gui for top-down mass spectral deconvolution.
topfd_gui.exe
in the folder
C:\toppic_tutorial\toppic
.c:\toppic_tutorial\tutorial_1\st_1.mzML
and c:\toppic_tutorial\tutorial_1\st_2.mzML
as input files.
The screenshot of topfd_gui
is shown below.
TopFD reports eight text files and four folders.
C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign
C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign
C:\toppic_tutorial\tutorial_1\st_1_ms1.feature
C:\toppic_tutorial\tutorial_1\st_1_ms2.feature
C:\toppic_tutorial\tutorial_1\st_2_ms1.feature
C:\toppic_tutorial\tutorial_1\st_2_ms2.feature
C:\toppic_tutorial\tutorial_1\st_1_feature.xml
C:\toppic_tutorial\tutorial_1\st_2_feature.xml
C:\toppic_tutorial\tutorial_1\st_1_html
C:\toppic_tutorial\tutorial_1\st_2_html
C:\toppic_tutorial\tutorial_1\st_1_file
C:\toppic_tutorial\tutorial_1\st_2_file
The output files and folders can be downloaded here.
We use toppic_gui to search the MS/MS spectra in
st_1_ms2.msalign
and st_2_ms2.msalign
against the protein database uniprot-st.fasta
to
identify PrSMs.
toppic_gui.exe
in the folder
C:\toppic_tutorial\toppic
.C:\toppic_tutorial\tutorial_1\uniprot-st.fasta
as the protein
database file.C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign
and C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign
as
mass spectrum data files. C57
as the fixed modification. Decoy database
. FDR
as the spectrum level cutoff type. FDR
as the proteoform level cutoff type.
The screenshots of toppic_gui
are shown below.
For each input msalign file, TopPIC reports two TSV files, an XML file, and collections of HTML files for identified proteoforms. For example, the output files for st_1_ms2.msalign are
C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_prsm.tsv
C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_proteoform.tsv
C:\toppic_tutorial\tutorial_1\st_1_ms2_toppic_proteoform.xml
C:\toppic_tutorial\tutorial_1\st_1_html\toppic_prsm_cutoff
C:\toppic_tutorial\tutorial_1\st_1_html\toppic_proteoform_cutoff
C:\toppic_tutorial\tutorial_1\st_1_html\topmsv
In addition, the identifications reported for st_1_ms2.msalign and st_2_ms2.msalign are combined, and filtered by a 1% spectrum-level FDR and a 1% proteoform-level FDR. The combined results are reported in the following files.
C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_prsm.tsv
C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_proteoform.tsv
C:\toppic_tutorial\tutorial_1\combined_ms2_toppic_proteoform.xml
C:\toppic_tutorial\tutorial_1\combined_html\toppic_prsm_cutoff
C:\toppic_tutorial\tutorial_1\combined_html\toppic_proteoform_cutoff
C:\toppic_tutorial\tutorial_1\combined_html\topmsv
In the analysis, C57 is selected as the fixed modification because proteins were reduced with dithiothreitol and alkylated with iodoacetamide before the MS experiment. When proteins are not reduced, no fixed modification should be selected.
A shuffled decoy database is concatenated
to the target database to estimate spectrum-level and proteoform-level
FDRs. All identified PrSMs are first filtered by a
1% spectrum-level FDR and the resulting PrSMs are reported in the
file combined_ms2_toppic_prsm.tsv
. The proteoforms corresponding to the PrSMs
are further filtered using a 1% proteoform-level FDR and
the resulting proteoforms and their corresponding best PrSMs are reported in the file
combined_ms2_toppic_proteoform.tsv
. Microsoft Excel can be used to open these two files.
To browse the PrSM identifications,
go to the folder combined_html\topmsv
and use Google
Chrome (Windows Edge and Firefox are not recommended)
to open the file index.html.
The output files can be downloaded here.
We use topindex to generate index files from the protein database uniprot-st.fasta
to speed up database search of TopPIC and TopMG.
C:\toppic_tutorial\toppic\topindex.exe
C:\toppic_tutorial\tutorial_1\uniprot-st.fasta
cd c:\toppic_tutorial\tutorial_1
..\toppic\topindex -f C57 -d uniprot-st.fasta
We use topfd for top-down mass spectral deconvolution.
C:\toppic_tutorial\toppic\topfd.exe
C:\toppic_tutorial\tutorial_1\st_1.mzML
C:\toppic_tutorial\tutorial_1\st_2.mzML
cd c:\toppic_tutorial\tutorial_1
..\toppic\topfd st_*.mzML
We use toppic to search the MS/MS spectra in st_1_ms2.msalign
and st_2_ms2.msalign
against the protein database uniprot-st.fasta
to identify PrSMs.
C:\toppic_tutorial\toppic\toppic.exe
C:\toppic_tutorial\tutorial_1\uniprot-st.fasta
C:\toppic_tutorial\tutorial_1\st_1_ms2.msalign
C:\toppic_tutorial\tutorial_1\st_2_ms2.msalign
C:\toppic_tutorial\tutorial_1\st_1_ms1.feature
C:\toppic_tutorial\tutorial_1\st_1_ms2.feature
C:\toppic_tutorial\tutorial_1\st_2_ms1.feature
C:\toppic_tutorial\tutorial_1\st_2_ms2.feature
cd c:\toppic_tutorial\tutorial_1
..\toppic\toppic -f C57 -d -t FDR -T FDR -c combined uniprot-st.fasta st_*_ms2.msalign
We will use TopMG to analyze the data set st_1.raw described in Tutorial 1. TopMG is still in the development stage. Please let us know if you find any bugs in it..
C:\toppic_tutorial\tutorial_2\
, and
unzip it. It includes the following files.
C:\toppic_tutorial\tutorial_2\uniprot-st.fasta
C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign
C:\toppic_tutorial\tutorial_2\st_1_ms1.feature
C:\toppic_tutorial\tutorial_2\st_1_ms2.feature
C:\toppic_tutorial\tutorial_2\variable_ptms.txt
C:\toppic_tutorial\tutorial_2\st_1_html
C:\toppic_tutorial\tutorial_2\st_1_file
To speed up database search, follow the steps in Section 4.2.1 to generate index files for the database file uniprot-st.fasta. If index files have been generated, it is not necessary to regenerate index files. You can copy the index folder to the folder C:\toppic_tutorial\tutorial_2\.
topmg_gui.exe
in the folder
C:\toppic_tutorial\toppic
.C:\toppic_tutorial\tutorial_2\uniprot-st.fasta
as the protein
database file.C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign
as a
mass spectrum data file. C:\toppic_tutorial\tutorial_2\variable_mods.txt
as the file of variable PTMs. C57
as the fixed modification. Decoy database
. FDR
as the spectrum level cutoff type. FDR
as the proteoform level cutoff type.
The screenshots of topmg_gui
are shown below.
TopMG reports two TSV files, an XML file, and collections of HTML files for identified proteoforms.
C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_prsm.tsv
C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_proteoform.tsv
C:\toppic_tutorial\tutorial_2\st_1_ms2_topmg_proteoform.xml
C:\toppic_tutorial\tutorial_2\st_1_html\topmg_prsm_cutoff
C:\toppic_tutorial\tutorial_2\st_1_html\topmg_proteoform_cutoff
C:\toppic_tutorial\tutorial_1\st_1_html\topmsv
The output files can be downloaded here.
To browse the PrSM identifications,
go to the folder st_1_html\topmsv
and use Google
Chrome (Windows Edge and Firefox are not recommended)
to open the file index.html.
C:\toppic_tutorial\toppic\topmg.exe
C:\toppic_tutorial\tutorial_2\uniprot-st.fasta
C:\toppic_tutorial\tutorial_2\st_1_ms2.msalign
C:\toppic_tutorial\tutorial_2\st_1_ms1.feature
C:\toppic_tutorial\tutorial_2\st_1_ms2.feature
C:\toppic_tutorial\tutorial_2\variable_mods.txt
cd c:\toppic_tutorial\tutorial_2
..\toppic\topindex -f C57 -d uniprot-st.fasta
..\toppic\topmg -f C57 -d -t FDR -v 0.05 -T FDR -V 0.05 -i variable_mods.txt uniprot-st.fasta st_1_ms2.msalign
We will use TopPIC and TopDiff to compare the abundance of proteoforms and find differentially expressed proteoforms using two MS data files of Escherichia coli cells (ecoli_1.raw and ecoli_2.raw).
In the MS experiment, the protein extract of E. coli was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was separated by capillary zone electrophoresis and analyzed by an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). Technical duplicates were generated for testing proteoform quantification in two runs of the same sample.
C:\toppic_tutorial\tutorial_3\
, and
unzip it. It includes the following files.
C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.feature
C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.feature
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.feature
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.feature
C:\toppic_tutorial\tutorial_3\ecoli_1_html
C:\toppic_tutorial\tutorial_3\ecoli_2_html
C:\toppic_tutorial\tutorial_3\ecoli_1_file
C:\toppic_tutorial\tutorial_3\ecoli_2_file
To speed up database search, follow the steps in Section 4.2.1 to generate index files for the database file uniprot-ecoli.fasta. If index files have been generated, it is not necessary to regenerate index files.
We use toppic_gui to search the MS/MS spectra in
ecoli_1_ms2.msalign
and ecoli_2_ms2.msalign
against the protein database uniprot-ecoli.fasta
to
identify PrSMs.
toppic_gui.exe
in the folder
C:\toppic_tutorial\toppic
.C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
as the protein
database file.C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
and C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
as
mass spectrum data files. C57
as the fixed modification. Decoy database
. FDR
as the spectrum level cutoff type. FDR
as the proteoform level cutoff type.
The screenshots of toppic_gui
are shown below.
For each input msalign file, TopPIC reports two TSV files, an XML file, and collections of html files for identified proteoforms. As a result, the output files for ecoli_1_ms2.msalign, ecoli_2_ms2.msalign are
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_prsm.tsv
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_prsm.tsv
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.tsv
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.tsv
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.xml
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.xml
C:\toppic_tutorial\tutorial_3\ecoli_1_html\toppic_prsm_cutoff
C:\toppic_tutorial\tutorial_3\ecoli_2_html\toppic_prsm_cutoff
C:\toppic_tutorial\tutorial_3\ecoli_1_html\toppic_proteoform_cutoff
C:\toppic_tutorial\tutorial_3\ecoli_2_html\toppic_proteoform_cutoff
C:\toppic_tutorial\tutorial_3\ecoli_1_html\topmsv
C:\toppic_tutorial\tutorial_3\ecoli_2_html\topmsv
The output files can be downloaded here.
topdiff_gui.exe
in the folder
C:\toppic_tutorial\toppic
.C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
as the protein
database file.C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
and C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
as a
mass spectrum data file. C57
as the fixed modification.
The screenshots of topdiff_gui
are shown below.
TopDiff reports one TSV file for identified proteoforms with their abundances in the input mass spectrum data
C:\toppic_tutorial\tutorial_3\sample_diff.tsv
The output file can be downloaded here.
C:\toppic_tutorial\toppic\toppic.exe
C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_1_ms1.feature
C:\toppic_tutorial\tutorial_3\ecoli_2_ms1.feature
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.feature
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.feature
cd c:\toppic_tutorial\tutorial_3
..\toppic\topindex -f C57 -d uniprot-ecoli.fasta
..\toppic\toppic -f C57 -d -t FDR -T FDR uniprot-ecoli.fasta ecoli_*_ms2.msalign
C:\toppic_tutorial\toppic\topdiff.exe
C:\toppic_tutorial\tutorial_3\uniprot-ecoli.fasta
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2.msalign
C:\toppic_tutorial\tutorial_3\ecoli_1_ms2_toppic_proteoform.xml
C:\toppic_tutorial\tutorial_3\ecoli_2_ms2_toppic_proteoform.xml
cd c:\toppic_tutorial\tutorial_3
..\toppic\topdiff -f C57 uniprot-ecoli.fasta ecoli_1_ms2.msalign ecoli_2_ms2.msalign