INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results.


Journal

Rapid communications in mass spectrometry : RCM
ISSN: 1097-0231
Titre abrégé: Rapid Commun Mass Spectrom
Pays: England
ID NLM: 8802365

Informations de publication

Date de publication:
20 May 2021
Historique:
revised: 14 04 2021
received: 20 01 2021
accepted: 17 05 2021
pubmed: 21 5 2021
medline: 21 5 2021
entrez: 20 5 2021
Statut: aheadofprint

Résumé

Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.

Identifiants

pubmed: 34015160
doi: 10.1002/rcm.9128
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e9128

Subventions

Organisme : Thermo Scientific

Informations de copyright

© 2021 John Wiley & Sons Ltd.

Références

Verheggen K, Raeder H, Berven FS, Martens L, Barsnes H, Vaudel M. Anatomy and evolution of database search engines-A central component of mass spectrometry based proteomic workflows. Mass Spectrom Rev. 2020;39(3):292-306. https://doi.org/10.1002/mas.21543
Tabb DL. The SEQUEST family tree. J Am Soc Mass Spectrom. 2015;26(11):1814-1819. https://doi.org/10.1007/s13361-015-1201-3
Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, et al. Andromeda: A peptide search engine integrated into the MaxQuant environment. J Proteome Res. 2011;10(4):1794-1805. https://doi.org/10.1021/pr101065j
Eng JK, Fischer B, Grossmann J, Maccoss MJ. A fast SEQUEST cross correlation algorithm. J Proteome Res. 2008;7(10):4598-4602. https://doi.org/10.1021/pr800420s
Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976-989. https://doi.org/10.1016/1044-0305(94)80016-2
Deutsch EW, Perez-Riverol Y, Chalkley RJ, et al. Expanding the use of spectral libraries in proteomics. J Proteome Res. 2018;17(12):4051-4060. https://doi.org/10.1021/acs.jproteome.8b00485
Stein SE, Scott DR. Optimization and testing of mass spectral library search algorithms for compound identification. J Am Soc Mass Spectrom. 1994;5(9):859-866. https://doi.org/10.1021/jasms.8b00613
Zhang X, Li Y, Shao W, Lam H. Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis. Proteomics. 2011;11(6):1075-1085. https://doi.org/10.1002/pmic.201000492
Lam H, Deutsch EW, Aebersold R. Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. J Proteome Res. 2010;9(1):605-610. https://doi.org/10.1021/pr900947u
Zhang Z, Burke MC, Mirokhin YA, Tchekhovskoi DV, Markey SP, et al. Reverse and random decoy methods for false discovery rate estimation in high mass accuracy peptide spectral library searches. NIST. 2020;17:846-857.
Degroeve S, Martens L. MS2PIP: A tool for MS/MS peak intensity prediction. Bioinformatics. 2013;29(24):3199-3203. https://doi.org/10.1093/bioinformatics/btt544
Gessulat S, Schmidt T, Zolg DP, et al. Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods. 2019;16(6):509-518. https://doi.org/10.1038/s41592-019-0426-7
Tiwary S, Levy R, Gutenbrunner P, Soto FS, Palaniappan KK, et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat Methods. 2019;16(6):519-525. https://doi.org/10.1038/s41592-019-0427-6
Zhou X-X, Zeng W-F, Chi H, et al. pDeep: Predicting MS/MS spectra of peptides with deep learning. Anal Chem. 2017;89(23):12690-12697. https://doi.org/10.1021/acs.analchem.7b02566
Li K, Jain A, Malovannaya A, Wen B, Zhang B. DeepResco re: Leveraging deep learning to improve peptide identification in immunopeptidomics. Proteomics. 2020;20(21-22):1900334. https://doi.org/10.1002/pmic.201900334
Silva ASC, Bouwmeester R, Martens L, Degroeve S. Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions. Bioinformatics. 2019;35(24):5243-5248. https://doi.org/10.1093/bioinformatics/btz383
Sun S, Meyer-Arendt K, Eichelberger B, Brown R, Yen C-Y, et al. Improved validation of peptide MS/MS assignments using spectral intensity prediction * S. Mol Cell Proteomics. 2007;6:1-17. https://doi.org/10.1074/mcp.M600320-MCP200
Yu W, Taylor JA, Davis MT, et al. Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines. Proteomics. 2010;10(6):1172-1189. https://doi.org/10.1002/pmic.200900074
The M, MacCoss MJ, Noble WS, Käll L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J Am Soc Mass Spectrom. 2016;27:1719. https://doi.org/10.1007/s13361-016-1460-7
Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 2007;4(11):923-925. https://doi.org/10.1038/nmeth1113
Toprak UH, Gillet LC, Maiolica A, Navarro P, Leitner A, et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol Cell Proteomics. 2004;13:2056. https://doi.org/10.1074/mcp.O113.036475
Vaisar T, Urban J. Probing the proline effect in CID of protonated peptides. J Mass Spectrom. 1996;31(10):1185-1187. https://doi.org/10.1002/(sici)1096-9888(199610)31:10<1185::Aid-jms396>3.0.Co;2-q
Chong C, Müller M, Pak H, et al. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun. 2020;11(1):1-21. https://doi.org/10.1038/s41467-020-14968-9
Abelin JG, Keskin DB, Sarkizova S, et al. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity. 2017;46(2):315-326. https://doi.org/10.1016/j.immuni.2017.02.007
Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 2020;48:W449-W454. https://doi.org/10.1093/nar/gkaa379
Zeng W-F, Zhou X-X, Zhou W-J, Chi H, Zhan J, He SM. MS/MS Spectrum prediction for modified peptides using pDeep2 trained by transfer learning. Anal Chem. 2019;91(15):9724-9731. https://doi.org/10.1021/acs.analchem.9b01262
Gabriels R, Martens L, Degroeve S. Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Res. 2019;47(W1):W295-W299. https://doi.org/10.1093/nar/gkz299
Schmidt T, Samaras P, Dorfer V, et al. Universal Spectrum Explorer: A standalone (web-)application for cross-resource spectrum comparison. bioRxiv, 2020. 2020.2009.2008.287557.
Zolg DP, Wilhelm M, Schnatbaum K, et al. Building ProteomeTools based on a complete synthetic human proteome. Nat Methods. 2017;14(3):259-262. https://doi.org/10.1038/nmeth.4153
Perez-Riverol Y, Csordas A, Bai J, et al. The PRIDE database and related tools and resources in 2019: Improving support for quantification data. Nucleic Acids Res. 2019;47(D1):D442-D450. https://doi.org/10.1093/nar/gky1106
Wagih O. Ggseqlogo: A versatile R package for drawing sequence logos. Bioinformatics. 2017;33(22):3645-3647. https://doi.org/10.1093/bioinformatics/btx469

Auteurs

Daniel P Zolg (DP)

MSAID GmbH, Garching b. München, Germany.

Siegfried Gessulat (S)

MSAID GmbH, Garching b. München, Germany.

Carmen Paschke (C)

Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany.

Michael Graber (M)

MSAID GmbH, Garching b. München, Germany.

Magnus Rathke-Kuhnert (M)

MSAID GmbH, Garching b. München, Germany.

Florian Seefried (F)

MSAID GmbH, Garching b. München, Germany.

Kai Fitzemeier (K)

Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany.

Frank Berg (F)

Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany.

Daniel Lopez-Ferrer (D)

Thermo Fisher Scientific, San Jose, CA, USA.

David Horn (D)

Thermo Fisher Scientific, San Jose, CA, USA.

Christoph Henrich (C)

Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany.

Andreas Huhmer (A)

Thermo Fisher Scientific, San Jose, CA, USA.

Bernard Delanghe (B)

Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany.

Martin Frejno (M)

MSAID GmbH, Garching b. München, Germany.

Classifications MeSH