Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
06 2019
Historique:
received: 20 08 2018
accepted: 18 04 2019
pubmed: 28 5 2019
medline: 10 7 2019
entrez: 29 5 2019
Statut: ppublish

Résumé

In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.

Identifiants

pubmed: 31133760
doi: 10.1038/s41592-019-0426-7
pii: 10.1038/s41592-019-0426-7
doi:

Substances chimiques

Peptide Fragments 0
Peptide Library 0
Proteome 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

509-518

Commentaires et corrections

Type : CommentIn

Références

Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).
doi: 10.1038/nature19949
Zhang, Y., Fonslow, B. R., Shan, B., Baek, M.-C. & Yates, J. R. Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113, 2343–2394 (2013).
doi: 10.1021/cr3003533
Mallick, P. & Kuster, B. Proteomics: a pragmatic perspective. Nat. Biotechnol. 28, 695 (2010).
doi: 10.1038/nbt.1658
Sinitcyn, P., Rudolph, J. D. & Cox, J. Computational methods for understanding mass spectrometry-based shotgun proteomics data. Annu. Rev. Biomed. Data Sci. 1, 207–234 (2018).
doi: 10.1146/annurev-biodatasci-080917-013516
Cox, J. et al. Andromeda: a peptide search engine integrated into the maxquant environment. J. Proteome Res. 10, 1794–1805 (2011).
doi: 10.1021/pr101065j
Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
doi: 10.1016/1044-0305(94)80016-2
Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 5, 859–866 (1994).
doi: 10.1016/1044-0305(94)87009-8
Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).
doi: 10.1002/pmic.200600625
Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426–441 (2015).
doi: 10.1038/nprot.2015.015
Deutsch, E. W. et al. Expanding the use of spectral libraries in proteomics. J. Proteome Res. 17, 4051–4060 (2018).
doi: 10.1021/acs.jproteome.8b00485
Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).
doi: 10.1074/mcp.O111.016717
Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).
doi: 10.1038/msb.2008.61
Bruderer, R., Bernhardt, O. M., Gandhi, T. & Reiter, L. High‐precision iRT prediction in the targeted analysis of data‐independent acquisition and its impact on identification and quantitation. Proteomics 16, 2246–2256 (2016).
doi: 10.1002/pmic.201500488
Krokhin, O. V. & Spicer, V. Generation of accurate peptide retention data for targeted and data independent quantitative LC–MS analysis: chromatographic lessons in proteomics. Proteomics 16, 2931–2936 (2016).
doi: 10.1002/pmic.201600283
Moruz, L. et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012).
doi: 10.1002/pmic.201100386
Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P. & Gygi, S. P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).
doi: 10.1038/nbt930
Arnold, R. J., Jayasankar, N., Aggarwal, D., Tang, H. & Radivojac, P. A machine learning approach to predicting peptide fragmentation spectra. Pac. Symp. Biocomput. 2006, 219–230 (2006).
Frank, A. M. Predicting intensity ranks of peptide fragment ions. J. Proteome Res. 8, 2226–2240 (2009).
doi: 10.1021/pr800677f
Degroeve, S., Maddelein, D. & Martens, L. MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Res. 43, W326–W330 (2015).
doi: 10.1093/nar/gkv542
Zhou, X.-X. et al. pDeep: predicting MS/MS spectra of peptides with deep learning. Anal. Chem. 89, 12690–12697 (2017).
doi: 10.1021/acs.analchem.7b02566
Zolg, D. et al. PROCAL: a set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, 1700263 (2017).
doi: 10.1002/pmic.201700263
Zolg, D. P. et al. Building ProteomeTools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017).
doi: 10.1038/nmeth.4153
Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://arxiv.org/abs/1609.08144 (2016).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Proc. International Conference on Machine Learning (eds. Bach, F. & Blei, D.) 2048–2057 (JMLR, 2015).
Krokhin, O. V. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal. Chem. 78, 7785–7795 (2006).
doi: 10.1021/ac060777w
Toprak, U. H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014).
doi: 10.1074/mcp.O113.036475
Diedrich, J. K., Pinto, A. F. M. & Yates, J. R. Energy dependence of HCD on peptide fragmentation: stepped collisional energy finds the sweet spot. J. Am. Soc. Mass Spectrom. 24, 1690–1699 (2013).
doi: 10.1007/s13361-013-0709-7
Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 (2017).
doi: 10.1016/j.cels.2017.05.009
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 16, 2296–2309 (2017).
doi: 10.1074/mcp.RA117.000314
Fabre, B. et al. Spectral libraries for SWATH-MS assays for Drosophila melanogaster and Solanum lycopersicum. Proteomics 17, 1700216 (2017).
doi: 10.1002/pmic.201700216
Schmidt, T. et al. ProteomicsDB. Nucleic Acids Res. 46, D1271–D1281 (2017).
doi: 10.1093/nar/gkx1029
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
doi: 10.1038/nature13319
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
doi: 10.1038/nmeth1019
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).
doi: 10.1007/s13361-016-1460-7
Shanmugam, A. K. & Nesvizhskii, A. I. Effective leveraging of targeted search spaces for improving peptide identification in tandem mass spectrometry based proteomics. J. Proteome Res. 14, 5169–5178 (2015).
doi: 10.1021/acs.jproteome.5b00504
Muth, T., Benndorf, D., Reichl, U., Rapp, E. & Martens, L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Mol. Biosyst. 9, 578–585 (2012).
doi: 10.1039/C2MB25415H
Rechenberger, J. et al. Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae. Proteomes 7, 2 (2019).
doi: 10.3390/proteomes7010002
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834 (2014).
doi: 10.1038/nbt.2942
Muth, T. R. et al. Navigating through metaproteomics data: a logbook of database searching. Proteomics 15, 3439–3453 (2017).
doi: 10.1002/pmic.201400560
Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114 (2014).
doi: 10.1038/nmeth.3144
Schumacher, F. R. et al. Building proteomic tool boxes to monitor MHC class I and class II peptides. Proteomics 17, 1600061 (2017).
doi: 10.1002/pmic.201600061
Zolg, D. et al. ProteomeTools: systematic characterization of 21 post-translational protein modifications by LC-MS/MS using synthetic peptides. Mol. Cell. Proteomics 17, 1850–1863 (2018).
doi: 10.1074/mcp.TIR118.000783
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
doi: 10.15252/msb.20188503
Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).
doi: 10.1021/acs.jproteome.7b00836
Wenschuh, H. et al. Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Pept. Sci. 55, 188–206 (2000).
doi: 10.1002/1097-0282(2000)55:3<188::AID-BIP20>3.0.CO;2-T
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at https://arxiv.org/abs/1412.3555 (2014).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).
Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010).
doi: 10.1021/pr1005058
Davis, S. et al. Expanding proteome coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) combined with broad specificity proteolysis. J. Proteome Res. 16, 1288–1299 (2017).
doi: 10.1021/acs.jproteome.6b00915

Auteurs

Siegfried Gessulat (S)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.
SAP SE, Potsdam, Germany.

Tobias Schmidt (T)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.

Daniel Paul Zolg (DP)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.

Patroklos Samaras (P)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.

Karsten Schnatbaum (K)

JPT Peptide Technologies GmbH, Berlin, Germany.

Johannes Zerweck (J)

JPT Peptide Technologies GmbH, Berlin, Germany.

Tobias Knaute (T)

JPT Peptide Technologies GmbH, Berlin, Germany.

Julia Rechenberger (J)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.

Bernard Delanghe (B)

Thermo Fisher Scientific, Bremen, Germany.

Andreas Huhmer (A)

Thermo Fisher Scientific, San Jose, CA, USA.

Ulf Reimer (U)

JPT Peptide Technologies GmbH, Berlin, Germany.

Hans-Christian Ehrlich (HC)

SAP SE, Potsdam, Germany.

Bernhard Kuster (B)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany. kuster@tum.de.
Bavarian Center for Biomolecular Mass Spectrometry, Freising, Germany. kuster@tum.de.

Mathias Wilhelm (M)

Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany. mathias.wilhelm@tum.de.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH