DeepLC can predict retention times for peptides that carry as-yet unseen modifications.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
11 2021
Historique:
received: 15 04 2020
accepted: 13 09 2021
pubmed: 30 10 2021
medline: 29 12 2021
entrez: 29 10 2021
Statut: ppublish

Résumé

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.

Identifiants

pubmed: 34711972
doi: 10.1038/s41592-021-01301-5
pii: 10.1038/s41592-021-01301-5
doi:

Substances chimiques

Peptide Fragments 0
Proteins 0
Proteome 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1363-1369

Informations de copyright

© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
pubmed: 12634793 doi: 10.1038/nature01511
Shishkova, E., Hebert, A. S. & Coon, J. J. Now, more than ever, proteomics needs better chromatography. Cell Syst. 3, 321–324 (2016).
pubmed: 27788355 pmcid: 5448283 doi: 10.1016/j.cels.2016.10.007
Michalski, A., Cox, J. & Mann, M. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC–MS/MS. J. Proteome Res. 10, 1785–1793 (2011).
pubmed: 21309581 doi: 10.1021/pr101060v
Bruderer, R. et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues*[S]. Mol. Cell. Proteom. 14, 1400–1410 (2015).
doi: 10.1074/mcp.M114.044305
Moruz, L. & Käll, L. Peptide retention time prediction. Mass Spectrom. Rev. 36, 615–623 (2017).
pubmed: 26799864 doi: 10.1002/mas.21488
Reimer, J., Spicer, V. & Krokhin, O. V. Application of modern reversed-phase peptide retention prediction algorithms to the Houghten and DeGraw dataset: peptide helicity and its effect on prediction accuracy. J. Chromatogr. A. 1256, 160–168 (2012).
pubmed: 22897865 doi: 10.1016/j.chroma.2012.07.092
Searle, B. C. et al. Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry. Nat. Commun. 9, 5128 (2018).
pubmed: 30510204 pmcid: 6277451 doi: 10.1038/s41467-018-07454-w
Guo, D., Mant, C. T., Taneja, A. K. & Hodges, R. S. Prediction of peptide retention times in reversed-phase high-performance liquid chromatography II. Correlation of observed and predicted peptide retention times factors and influencing the retention times of peptides. J. Chromatogr. A. 359, 519–532 (1986).
doi: 10.1016/0021-9673(86)80103-0
Meek, J. L. Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. Proc. Natl Acad. Sci. USA 77, 1632–1636 (1980).
pubmed: 6929513 pmcid: 348551 doi: 10.1073/pnas.77.3.1632
Palmblad, M., Ramström, M., Markides, K. E., Håkansson, P. & Bergquist, J. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal. Chem. 74, 5826–5830 (2002).
pubmed: 12463368 doi: 10.1021/ac0256890
Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010).
pubmed: 20735070 doi: 10.1021/pr1005058
Moruz, L. et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012).
pubmed: 22577017 doi: 10.1002/pmic.201100386
Guan, S., Moran, M. F. & Ma, B. Prediction of LC-MS/MS properties of peptides from sequence by deep learning. Mol. Cell. Proteom. 18, 2099–2107 (2019).
doi: 10.1074/mcp.TIR119.001412
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
pubmed: 31133760 doi: 10.1038/s41592-019-0426-7
Ma, C. et al. Improved peptide retention time prediction in liquid chromatography through deep learning. Anal. Chem. 90, 10881–10888 (2018).
pubmed: 30114359 doi: 10.1021/acs.analchem.8b02386
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
pubmed: 20147306 pmcid: 2844992 doi: 10.1093/bioinformatics/btq054
C Silva, A. S. et al. Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions. Bioinformatics 35, 1401–1403 (2019).
Bertsch, A. et al. Optimal de novo design of MRM experiments for rapid assay development in targeted proteomics. J. Proteome Res. 9, 2696–2704 (2010).
pubmed: 20201589 doi: 10.1021/pr1001803
Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).
pubmed: 29863353 pmcid: 6079931 doi: 10.1021/acs.jproteome.7b00836
Van Puyvelde, B. et al. Removing the hidden data dependency of DIA with predicted spectral libraries. Proteomics 20, 1900306 (2020).
doi: 10.1002/pmic.201900306
Yang, Y. et al. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat. Commun. 11, 146 (2020).
pubmed: 31919359 pmcid: 6952453 doi: 10.1038/s41467-019-13866-z
Searle, B. C. et al. Generating high quality libraries for DIA MS with empirically corrected peptide predictions. Nat. Commun. 11, 1548 (2020).
pubmed: 32214105 pmcid: 7096433 doi: 10.1038/s41467-020-15346-1
Bouwmeester, R., Gabriels, R., Van Den Bossche, T., Martens, L. & Degroeve, S. The age of data‐driven proteomics: how machine learning enables novel workflows. Proteomics 20, 1900351 (2020).
doi: 10.1002/pmic.201900351
Bittremieux, W., Meysman, P., Noble, W. S. & Laukens, K. Fast open modification spectral library searching through approximate nearest neighbor indexing. J. Proteome Res. 17, 3463–3474 (2018).
pubmed: 30184435 pmcid: 6173621 doi: 10.1021/acs.jproteome.8b00359
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat. Methods 14, 513–520 (2017).
pubmed: 28394336 pmcid: 5409104 doi: 10.1038/nmeth.4256
Chi, H. et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol. 36, 1059–1066 (2018).
doi: 10.1038/nbt.4236
Na, S., Bandeira, N. & Paek, E. Fast multi-blind modification search through tandem mass spectrometry. Mol. Cell Proteomics 11, M111.010199 (2012).
Creasy, D. M. & Cottrell, J. S. Unimod: protein modifications for mass spectrometry. Proteomics 4, 1534–1536 (2004).
pubmed: 15174123 doi: 10.1002/pmic.200300744
Wren, S. A. C. Peak capacity in gradient ultra performance liquid chromatography (UPLC). J. Pharm. Biomed. Anal. 38, 337–343 (2005).
pubmed: 15925228 doi: 10.1016/j.jpba.2004.12.028
Paul Zolg, D. et al. Proteometools: systematic characterization of 21 post-translational protein modifications by liquid chromatography tandem mass spectrometry (LC-MS/MS) using synthetic peptides. Mol. Cell. Proteom. 17, 1850–1863 (2018).
doi: 10.1074/mcp.TIR118.000783
Colaert, N., Degroeve, S., Helsens, K. & Martens, L. Analysis of the resolution limitations of peptide identification algorithms. J. Proteome Res. 10, 5555–5561 (2011).
pubmed: 21995378 doi: 10.1021/pr200913a
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
pubmed: 24870542 pmcid: 4403737 doi: 10.1038/nature13302
Müller, T. & Winter, D. Systematic evaluation of protein reduction and alkylation reveals massive unspecific side effects by iodine-containing reagents. Mol. Cell. Proteom. 16, 1173–1187 (2017).
doi: 10.1074/mcp.M116.064048
Salz, R. et al. Personalized proteome: comparing proteogenomics and open variant search approaches for single amino acid variant detection. J. Proteome Res. 20, 3353–3364 (2021).
pubmed: 33998808 pmcid: 8280751 doi: 10.1021/acs.jproteome.1c00264
Aicheler, F. et al. Retention time prediction improves identification in nontargeted lipidomics approaches. Anal. Chem. 87, 7698–7704 (2015).
pubmed: 26145158 doi: 10.1021/acs.analchem.5b01139
Creek, D. J. et al. Toward global metabolomics analysis with hydrophilic interaction liquid chromatography–mass spectrometry: improved metabolite identification by retention time prediction. Anal. Chem. 83, 8703–8710 (2011).
pubmed: 21928819 doi: 10.1021/ac2021823
Fukushima, K. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988).
doi: 10.1016/0893-6080(88)90014-7
Ranzato, M., Huang, F., Boureau, Y. B. & LeCun, Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA (IEEE, 2007).
Parker, J. M. R., Guo, D. & Hodges, R. S. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: Correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25, 5425–5432 (1986).
pubmed: 2430611 doi: 10.1021/bi00367a013
Nair, V. & Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf (Univ. Toronto, 2010).
Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).
Kelstrup, C. D. et al. Performance evaluation of the Q exactive HF-X for shotgun proteomics. J. Proteome Res. 17, 727–738 (2018).
pubmed: 29183128 doi: 10.1021/acs.jproteome.7b00602
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteom. 16, 2296–2309 (2017).
doi: 10.1074/mcp.RA117.000314
Zolg, D. P. et al. Building ProteomeTools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017).
pubmed: 28135259 pmcid: 5868332 doi: 10.1038/nmeth.4153
Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121 (2012).
pubmed: 22577012 pmcid: 3918884 doi: 10.1002/pmic.201100463
Zolg, D. P. et al. PROCAL: A set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, 1700263 (2017).
doi: 10.1002/pmic.201700263
Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545 (2005).
pubmed: 16041671 doi: 10.1002/pmic.200401303
Hulstaert, N. et al. ThermoRawFileParser: modular, scalable, and cross-platform RAW file conversion. J. Proteome Res. 19, 537–542 (2020).
pubmed: 31755270 doi: 10.1021/acs.jproteome.9b00328
Kim, S. & Pevzner, P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).
pubmed: 25358478 doi: 10.1038/ncomms6277
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
pubmed: 17952086 doi: 10.1038/nmeth1113
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
pubmed: 22908215 doi: 10.1093/bioinformatics/bts480
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
pubmed: 19029910 doi: 10.1038/nbt.1511
Vizcaíno, J. A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).
pubmed: 24727771 pmcid: 3986813 doi: 10.1038/nbt.2839
Li, W. et al. Assessing the relationship between mass window width and retention time scheduling on protein coverage for data-independent acquisition. J. Am. Soc. Mass. Spectrom. 30, 1396–1405 (2019).
pubmed: 31147889 doi: 10.1007/s13361-019-02243-1
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
Gussakovsky, D., Neustaeter, H., Spicer, V. & Krokhin, O. V. Sequence-specific model for peptide retention time prediction in strong cation exchange chromatography. Anal. Chem. 89, 11795–11802 (2017).
pubmed: 28971681 doi: 10.1021/acs.analchem.7b03436
Jarnuczak, A. F. et al. Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J. Proteome Res. 15, 2945–2959 (2016).
pubmed: 27454336 doi: 10.1021/acs.jproteome.6b00048
Mucha, S. et al. The formation of a camalexin biosynthetic metabolon. Plant Cell 31, 2697–2710 (2019).
pubmed: 31511315 pmcid: 6881122
Nagaraj, N. et al. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top orbitrap. Mol. Cell. Proteomics 11, M111.013722 (2012).
Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).
pubmed: 25159151 doi: 10.1016/j.celrep.2014.07.036
McKinney, W. pandas: a foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 1–9, https://www.dlr.de/sc/en/Portaldata/15/Resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf (2011).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint at arXiv.org www.tensorflow.org
Levitsky, L. I., Klein, J. A., Ivanov, M. V. & Gorshkov, M. V. Pyteomics 4.0: five years of development of a python proteomics framework. J. Proteome Res. 18, 709–714 (2019).
pubmed: 30576148 doi: 10.1021/acs.jproteome.8b00717
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
pubmed: 32015543 pmcid: 7056644 doi: 10.1038/s41592-019-0686-2
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
doi: 10.21105/joss.03021
Oliphant, T. E. A Guide to NumPy Vol. 1 (Trelgol Publishing, 2006).
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J. Am. Soc. Mass. Spectrom. 27, 1719–1727 (2016).
pubmed: 27572102 pmcid: 5059416 doi: 10.1007/s13361-016-1460-7

Auteurs

Robbin Bouwmeester (R)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.

Ralf Gabriels (R)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.

Niels Hulstaert (N)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.

Lennart Martens (L)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium. lennart.martens@vib-ugent.be.
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium. lennart.martens@vib-ugent.be.

Sven Degroeve (S)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH