Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.
Animals
Caenorhabditis elegans
/ metabolism
Databases, Protein
Deep Learning
Drosophila melanogaster
/ metabolism
HEK293 Cells
Humans
Neural Networks, Computer
Peptide Fragments
/ analysis
Peptide Library
Proteome
/ analysis
Saccharomyces cerevisiae
/ metabolism
Software
Tandem Mass Spectrometry
/ methods
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
received:
20
08
2018
accepted:
18
04
2019
pubmed:
28
5
2019
medline:
10
7
2019
entrez:
29
5
2019
Statut:
ppublish
Résumé
In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.
Identifiants
pubmed: 31133760
doi: 10.1038/s41592-019-0426-7
pii: 10.1038/s41592-019-0426-7
doi:
Substances chimiques
Peptide Fragments
0
Peptide Library
0
Proteome
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
509-518Commentaires et corrections
Type : CommentIn
Références
Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).
doi: 10.1038/nature19949
Zhang, Y., Fonslow, B. R., Shan, B., Baek, M.-C. & Yates, J. R. Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113, 2343–2394 (2013).
doi: 10.1021/cr3003533
Mallick, P. & Kuster, B. Proteomics: a pragmatic perspective. Nat. Biotechnol. 28, 695 (2010).
doi: 10.1038/nbt.1658
Sinitcyn, P., Rudolph, J. D. & Cox, J. Computational methods for understanding mass spectrometry-based shotgun proteomics data. Annu. Rev. Biomed. Data Sci. 1, 207–234 (2018).
doi: 10.1146/annurev-biodatasci-080917-013516
Cox, J. et al. Andromeda: a peptide search engine integrated into the maxquant environment. J. Proteome Res. 10, 1794–1805 (2011).
doi: 10.1021/pr101065j
Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
doi: 10.1016/1044-0305(94)80016-2
Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 5, 859–866 (1994).
doi: 10.1016/1044-0305(94)87009-8
Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).
doi: 10.1002/pmic.200600625
Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426–441 (2015).
doi: 10.1038/nprot.2015.015
Deutsch, E. W. et al. Expanding the use of spectral libraries in proteomics. J. Proteome Res. 17, 4051–4060 (2018).
doi: 10.1021/acs.jproteome.8b00485
Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).
doi: 10.1074/mcp.O111.016717
Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).
doi: 10.1038/msb.2008.61
Bruderer, R., Bernhardt, O. M., Gandhi, T. & Reiter, L. High‐precision iRT prediction in the targeted analysis of data‐independent acquisition and its impact on identification and quantitation. Proteomics 16, 2246–2256 (2016).
doi: 10.1002/pmic.201500488
Krokhin, O. V. & Spicer, V. Generation of accurate peptide retention data for targeted and data independent quantitative LC–MS analysis: chromatographic lessons in proteomics. Proteomics 16, 2931–2936 (2016).
doi: 10.1002/pmic.201600283
Moruz, L. et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012).
doi: 10.1002/pmic.201100386
Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P. & Gygi, S. P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).
doi: 10.1038/nbt930
Arnold, R. J., Jayasankar, N., Aggarwal, D., Tang, H. & Radivojac, P. A machine learning approach to predicting peptide fragmentation spectra. Pac. Symp. Biocomput. 2006, 219–230 (2006).
Frank, A. M. Predicting intensity ranks of peptide fragment ions. J. Proteome Res. 8, 2226–2240 (2009).
doi: 10.1021/pr800677f
Degroeve, S., Maddelein, D. & Martens, L. MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Res. 43, W326–W330 (2015).
doi: 10.1093/nar/gkv542
Zhou, X.-X. et al. pDeep: predicting MS/MS spectra of peptides with deep learning. Anal. Chem. 89, 12690–12697 (2017).
doi: 10.1021/acs.analchem.7b02566
Zolg, D. et al. PROCAL: a set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, 1700263 (2017).
doi: 10.1002/pmic.201700263
Zolg, D. P. et al. Building ProteomeTools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017).
doi: 10.1038/nmeth.4153
Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://arxiv.org/abs/1609.08144 (2016).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Proc. International Conference on Machine Learning (eds. Bach, F. & Blei, D.) 2048–2057 (JMLR, 2015).
Krokhin, O. V. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal. Chem. 78, 7785–7795 (2006).
doi: 10.1021/ac060777w
Toprak, U. H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014).
doi: 10.1074/mcp.O113.036475
Diedrich, J. K., Pinto, A. F. M. & Yates, J. R. Energy dependence of HCD on peptide fragmentation: stepped collisional energy finds the sweet spot. J. Am. Soc. Mass Spectrom. 24, 1690–1699 (2013).
doi: 10.1007/s13361-013-0709-7
Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 (2017).
doi: 10.1016/j.cels.2017.05.009
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 16, 2296–2309 (2017).
doi: 10.1074/mcp.RA117.000314
Fabre, B. et al. Spectral libraries for SWATH-MS assays for Drosophila melanogaster and Solanum lycopersicum. Proteomics 17, 1700216 (2017).
doi: 10.1002/pmic.201700216
Schmidt, T. et al. ProteomicsDB. Nucleic Acids Res. 46, D1271–D1281 (2017).
doi: 10.1093/nar/gkx1029
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
doi: 10.1038/nature13319
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
doi: 10.1038/nmeth1019
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).
doi: 10.1007/s13361-016-1460-7
Shanmugam, A. K. & Nesvizhskii, A. I. Effective leveraging of targeted search spaces for improving peptide identification in tandem mass spectrometry based proteomics. J. Proteome Res. 14, 5169–5178 (2015).
doi: 10.1021/acs.jproteome.5b00504
Muth, T., Benndorf, D., Reichl, U., Rapp, E. & Martens, L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Mol. Biosyst. 9, 578–585 (2012).
doi: 10.1039/C2MB25415H
Rechenberger, J. et al. Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae. Proteomes 7, 2 (2019).
doi: 10.3390/proteomes7010002
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834 (2014).
doi: 10.1038/nbt.2942
Muth, T. R. et al. Navigating through metaproteomics data: a logbook of database searching. Proteomics 15, 3439–3453 (2017).
doi: 10.1002/pmic.201400560
Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114 (2014).
doi: 10.1038/nmeth.3144
Schumacher, F. R. et al. Building proteomic tool boxes to monitor MHC class I and class II peptides. Proteomics 17, 1600061 (2017).
doi: 10.1002/pmic.201600061
Zolg, D. et al. ProteomeTools: systematic characterization of 21 post-translational protein modifications by LC-MS/MS using synthetic peptides. Mol. Cell. Proteomics 17, 1850–1863 (2018).
doi: 10.1074/mcp.TIR118.000783
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
doi: 10.15252/msb.20188503
Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).
doi: 10.1021/acs.jproteome.7b00836
Wenschuh, H. et al. Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Pept. Sci. 55, 188–206 (2000).
doi: 10.1002/1097-0282(2000)55:3<188::AID-BIP20>3.0.CO;2-T
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at https://arxiv.org/abs/1412.3555 (2014).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).
Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010).
doi: 10.1021/pr1005058
Davis, S. et al. Expanding proteome coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) combined with broad specificity proteolysis. J. Proteome Res. 16, 1288–1299 (2017).
doi: 10.1021/acs.jproteome.6b00915