DeepLC can predict retention times for peptides that carry as-yet unseen modifications.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
11 2021
11 2021
Historique:
received:
15
04
2020
accepted:
13
09
2021
pubmed:
30
10
2021
medline:
29
12
2021
entrez:
29
10
2021
Statut:
ppublish
Résumé
The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.
Identifiants
pubmed: 34711972
doi: 10.1038/s41592-021-01301-5
pii: 10.1038/s41592-021-01301-5
doi:
Substances chimiques
Peptide Fragments
0
Proteins
0
Proteome
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1363-1369Informations de copyright
© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
pubmed: 12634793
doi: 10.1038/nature01511
Shishkova, E., Hebert, A. S. & Coon, J. J. Now, more than ever, proteomics needs better chromatography. Cell Syst. 3, 321–324 (2016).
pubmed: 27788355
pmcid: 5448283
doi: 10.1016/j.cels.2016.10.007
Michalski, A., Cox, J. & Mann, M. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC–MS/MS. J. Proteome Res. 10, 1785–1793 (2011).
pubmed: 21309581
doi: 10.1021/pr101060v
Bruderer, R. et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues*[S]. Mol. Cell. Proteom. 14, 1400–1410 (2015).
doi: 10.1074/mcp.M114.044305
Moruz, L. & Käll, L. Peptide retention time prediction. Mass Spectrom. Rev. 36, 615–623 (2017).
pubmed: 26799864
doi: 10.1002/mas.21488
Reimer, J., Spicer, V. & Krokhin, O. V. Application of modern reversed-phase peptide retention prediction algorithms to the Houghten and DeGraw dataset: peptide helicity and its effect on prediction accuracy. J. Chromatogr. A. 1256, 160–168 (2012).
pubmed: 22897865
doi: 10.1016/j.chroma.2012.07.092
Searle, B. C. et al. Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry. Nat. Commun. 9, 5128 (2018).
pubmed: 30510204
pmcid: 6277451
doi: 10.1038/s41467-018-07454-w
Guo, D., Mant, C. T., Taneja, A. K. & Hodges, R. S. Prediction of peptide retention times in reversed-phase high-performance liquid chromatography II. Correlation of observed and predicted peptide retention times factors and influencing the retention times of peptides. J. Chromatogr. A. 359, 519–532 (1986).
doi: 10.1016/0021-9673(86)80103-0
Meek, J. L. Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. Proc. Natl Acad. Sci. USA 77, 1632–1636 (1980).
pubmed: 6929513
pmcid: 348551
doi: 10.1073/pnas.77.3.1632
Palmblad, M., Ramström, M., Markides, K. E., Håkansson, P. & Bergquist, J. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal. Chem. 74, 5826–5830 (2002).
pubmed: 12463368
doi: 10.1021/ac0256890
Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010).
pubmed: 20735070
doi: 10.1021/pr1005058
Moruz, L. et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012).
pubmed: 22577017
doi: 10.1002/pmic.201100386
Guan, S., Moran, M. F. & Ma, B. Prediction of LC-MS/MS properties of peptides from sequence by deep learning. Mol. Cell. Proteom. 18, 2099–2107 (2019).
doi: 10.1074/mcp.TIR119.001412
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
pubmed: 31133760
doi: 10.1038/s41592-019-0426-7
Ma, C. et al. Improved peptide retention time prediction in liquid chromatography through deep learning. Anal. Chem. 90, 10881–10888 (2018).
pubmed: 30114359
doi: 10.1021/acs.analchem.8b02386
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
pubmed: 20147306
pmcid: 2844992
doi: 10.1093/bioinformatics/btq054
C Silva, A. S. et al. Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions. Bioinformatics 35, 1401–1403 (2019).
Bertsch, A. et al. Optimal de novo design of MRM experiments for rapid assay development in targeted proteomics. J. Proteome Res. 9, 2696–2704 (2010).
pubmed: 20201589
doi: 10.1021/pr1001803
Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).
pubmed: 29863353
pmcid: 6079931
doi: 10.1021/acs.jproteome.7b00836
Van Puyvelde, B. et al. Removing the hidden data dependency of DIA with predicted spectral libraries. Proteomics 20, 1900306 (2020).
doi: 10.1002/pmic.201900306
Yang, Y. et al. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat. Commun. 11, 146 (2020).
pubmed: 31919359
pmcid: 6952453
doi: 10.1038/s41467-019-13866-z
Searle, B. C. et al. Generating high quality libraries for DIA MS with empirically corrected peptide predictions. Nat. Commun. 11, 1548 (2020).
pubmed: 32214105
pmcid: 7096433
doi: 10.1038/s41467-020-15346-1
Bouwmeester, R., Gabriels, R., Van Den Bossche, T., Martens, L. & Degroeve, S. The age of data‐driven proteomics: how machine learning enables novel workflows. Proteomics 20, 1900351 (2020).
doi: 10.1002/pmic.201900351
Bittremieux, W., Meysman, P., Noble, W. S. & Laukens, K. Fast open modification spectral library searching through approximate nearest neighbor indexing. J. Proteome Res. 17, 3463–3474 (2018).
pubmed: 30184435
pmcid: 6173621
doi: 10.1021/acs.jproteome.8b00359
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat. Methods 14, 513–520 (2017).
pubmed: 28394336
pmcid: 5409104
doi: 10.1038/nmeth.4256
Chi, H. et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol. 36, 1059–1066 (2018).
doi: 10.1038/nbt.4236
Na, S., Bandeira, N. & Paek, E. Fast multi-blind modification search through tandem mass spectrometry. Mol. Cell Proteomics 11, M111.010199 (2012).
Creasy, D. M. & Cottrell, J. S. Unimod: protein modifications for mass spectrometry. Proteomics 4, 1534–1536 (2004).
pubmed: 15174123
doi: 10.1002/pmic.200300744
Wren, S. A. C. Peak capacity in gradient ultra performance liquid chromatography (UPLC). J. Pharm. Biomed. Anal. 38, 337–343 (2005).
pubmed: 15925228
doi: 10.1016/j.jpba.2004.12.028
Paul Zolg, D. et al. Proteometools: systematic characterization of 21 post-translational protein modifications by liquid chromatography tandem mass spectrometry (LC-MS/MS) using synthetic peptides. Mol. Cell. Proteom. 17, 1850–1863 (2018).
doi: 10.1074/mcp.TIR118.000783
Colaert, N., Degroeve, S., Helsens, K. & Martens, L. Analysis of the resolution limitations of peptide identification algorithms. J. Proteome Res. 10, 5555–5561 (2011).
pubmed: 21995378
doi: 10.1021/pr200913a
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
pubmed: 24870542
pmcid: 4403737
doi: 10.1038/nature13302
Müller, T. & Winter, D. Systematic evaluation of protein reduction and alkylation reveals massive unspecific side effects by iodine-containing reagents. Mol. Cell. Proteom. 16, 1173–1187 (2017).
doi: 10.1074/mcp.M116.064048
Salz, R. et al. Personalized proteome: comparing proteogenomics and open variant search approaches for single amino acid variant detection. J. Proteome Res. 20, 3353–3364 (2021).
pubmed: 33998808
pmcid: 8280751
doi: 10.1021/acs.jproteome.1c00264
Aicheler, F. et al. Retention time prediction improves identification in nontargeted lipidomics approaches. Anal. Chem. 87, 7698–7704 (2015).
pubmed: 26145158
doi: 10.1021/acs.analchem.5b01139
Creek, D. J. et al. Toward global metabolomics analysis with hydrophilic interaction liquid chromatography–mass spectrometry: improved metabolite identification by retention time prediction. Anal. Chem. 83, 8703–8710 (2011).
pubmed: 21928819
doi: 10.1021/ac2021823
Fukushima, K. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988).
doi: 10.1016/0893-6080(88)90014-7
Ranzato, M., Huang, F., Boureau, Y. B. & LeCun, Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA (IEEE, 2007).
Parker, J. M. R., Guo, D. & Hodges, R. S. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: Correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25, 5425–5432 (1986).
pubmed: 2430611
doi: 10.1021/bi00367a013
Nair, V. & Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf (Univ. Toronto, 2010).
Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).
Kelstrup, C. D. et al. Performance evaluation of the Q exactive HF-X for shotgun proteomics. J. Proteome Res. 17, 727–738 (2018).
pubmed: 29183128
doi: 10.1021/acs.jproteome.7b00602
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteom. 16, 2296–2309 (2017).
doi: 10.1074/mcp.RA117.000314
Zolg, D. P. et al. Building ProteomeTools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017).
pubmed: 28135259
pmcid: 5868332
doi: 10.1038/nmeth.4153
Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121 (2012).
pubmed: 22577012
pmcid: 3918884
doi: 10.1002/pmic.201100463
Zolg, D. P. et al. PROCAL: A set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, 1700263 (2017).
doi: 10.1002/pmic.201700263
Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545 (2005).
pubmed: 16041671
doi: 10.1002/pmic.200401303
Hulstaert, N. et al. ThermoRawFileParser: modular, scalable, and cross-platform RAW file conversion. J. Proteome Res. 19, 537–542 (2020).
pubmed: 31755270
doi: 10.1021/acs.jproteome.9b00328
Kim, S. & Pevzner, P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).
pubmed: 25358478
doi: 10.1038/ncomms6277
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
pubmed: 17952086
doi: 10.1038/nmeth1113
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
pubmed: 22908215
doi: 10.1093/bioinformatics/bts480
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
pubmed: 19029910
doi: 10.1038/nbt.1511
Vizcaíno, J. A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).
pubmed: 24727771
pmcid: 3986813
doi: 10.1038/nbt.2839
Li, W. et al. Assessing the relationship between mass window width and retention time scheduling on protein coverage for data-independent acquisition. J. Am. Soc. Mass. Spectrom. 30, 1396–1405 (2019).
pubmed: 31147889
doi: 10.1007/s13361-019-02243-1
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
Gussakovsky, D., Neustaeter, H., Spicer, V. & Krokhin, O. V. Sequence-specific model for peptide retention time prediction in strong cation exchange chromatography. Anal. Chem. 89, 11795–11802 (2017).
pubmed: 28971681
doi: 10.1021/acs.analchem.7b03436
Jarnuczak, A. F. et al. Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J. Proteome Res. 15, 2945–2959 (2016).
pubmed: 27454336
doi: 10.1021/acs.jproteome.6b00048
Mucha, S. et al. The formation of a camalexin biosynthetic metabolon. Plant Cell 31, 2697–2710 (2019).
pubmed: 31511315
pmcid: 6881122
Nagaraj, N. et al. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top orbitrap. Mol. Cell. Proteomics 11, M111.013722 (2012).
Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).
pubmed: 25159151
doi: 10.1016/j.celrep.2014.07.036
McKinney, W. pandas: a foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 1–9, https://www.dlr.de/sc/en/Portaldata/15/Resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf (2011).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint at arXiv.org www.tensorflow.org
Levitsky, L. I., Klein, J. A., Ivanov, M. V. & Gorshkov, M. V. Pyteomics 4.0: five years of development of a python proteomics framework. J. Proteome Res. 18, 709–714 (2019).
pubmed: 30576148
doi: 10.1021/acs.jproteome.8b00717
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
pubmed: 32015543
pmcid: 7056644
doi: 10.1038/s41592-019-0686-2
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
doi: 10.21105/joss.03021
Oliphant, T. E. A Guide to NumPy Vol. 1 (Trelgol Publishing, 2006).
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J. Am. Soc. Mass. Spectrom. 27, 1719–1727 (2016).
pubmed: 27572102
pmcid: 5059416
doi: 10.1007/s13361-016-1460-7