Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
01 2023
01 2023
Historique:
received:
03
08
2022
accepted:
04
01
2023
revised:
01
02
2023
pubmed:
21
1
2023
medline:
4
2
2023
entrez:
20
1
2023
Statut:
epublish
Résumé
Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
Identifiants
pubmed: 36668672
doi: 10.1371/journal.pcbi.1010457
pii: PCOMPBIOL-D-22-01182
pmc: PMC9891523
doi:
Substances chimiques
Peptides
0
Peptide Hydrolases
EC 3.4.-
Antibodies, Monoclonal
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1010457Informations de copyright
Copyright: © 2023 Gueto-Tettay et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Déclaration de conflit d'intérêts
None of the listed authors have a conflict of interest.
Références
PLoS Comput Biol. 2021 Jan 7;17(1):e1008169
pubmed: 33411763
Anal Chem. 2004 Mar 1;76(5):1243-8
pubmed: 14987077
Expert Rev Proteomics. 2020 Jul - Aug;17(7-8):595-607
pubmed: 33016158
Proteomics. 2019 May;19(10):e1800361
pubmed: 31050378
J Comput Biol. 1999 Fall-Winter;6(3-4):327-42
pubmed: 10582570
Curr Biol. 2019 Apr 1;29(7):R231-R236
pubmed: 30939301
Rapid Commun Mass Spectrom. 2003;17(20):2337-42
pubmed: 14558135
J Am Soc Mass Spectrom. 2016 Nov;27(11):1719-1727
pubmed: 27572102
Mol Cell Proteomics. 2017 Apr;16(4 suppl 1):S29-S41
pubmed: 28183813
J Comput Chem. 2004 Oct;25(13):1605-12
pubmed: 15264254
Nat Commun. 2014 Oct 31;5:5277
pubmed: 25358478
Proc Natl Acad Sci U S A. 2017 Aug 1;114(31):8247-8252
pubmed: 28720701
Mol Cell Proteomics. 2004 Jun;3(6):608-14
pubmed: 15034119
Nat Biotechnol. 2017 Nov;35(11):1026-1028
pubmed: 29035372
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Biochim Biophys Acta. 2014 Feb;1840(2):838-46
pubmed: 23567800
Brief Bioinform. 2018 Sep 28;19(5):954-970
pubmed: 28369237
Nat Methods. 2022 Jun;19(6):679-682
pubmed: 35637307
Anal Bioanal Chem. 2018 Apr;410(10):2467-2484
pubmed: 29256076
MAbs. 2015;7(5):863-70
pubmed: 26067753
Anal Chem. 2005 Feb 15;77(4):964-73
pubmed: 15858974
Anal Chem. 2014 May 20;86(10):4758-66
pubmed: 24684310
Mol Cell Proteomics. 2020 Dec;19(12):2139-2157
pubmed: 33020190
Nat Protoc. 2016 May;11(5):993-1006
pubmed: 27123950
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Mol Cell Proteomics. 2013 Apr;12(4):856-65
pubmed: 23325769
Nucleic Acids Res. 2019 Jul 2;47(W1):W423-W428
pubmed: 31114872
J Am Soc Mass Spectrom. 2013 Nov;24(11):1690-9
pubmed: 23963813
Nat Commun. 2016 Jan 06;7:10261
pubmed: 26732734
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
J Mass Spectrom. 1996 Oct;31(10):1156-62
pubmed: 8916424
Med Microbiol Immunol. 2020 Jun;209(3):265-275
pubmed: 32072248
Mol Cell Proteomics. 2021;20:100113
pubmed: 34139362
Front Immunol. 2022 Jan 14;12:808932
pubmed: 35095897
J Proteome Res. 2010 Mar 5;9(3):1323-9
pubmed: 20113005
J Proteome Res. 2016 Mar 4;15(3):788-94
pubmed: 26709623
Front Genet. 2021 Jan 08;11:612475
pubmed: 33488677
Nat Commun. 2019 Jan 14;10(1):192
pubmed: 30643114
Annu Rev Immunol. 2007;25:21-50
pubmed: 17029568
Anal Chem. 2005 Nov 15;77(22):7265-73
pubmed: 16285674
Infect Immun. 2022 Feb 17;90(2):e0046221
pubmed: 34898252
J Am Soc Mass Spectrom. 1996 Jun;7(6):522-31
pubmed: 24203424
Cell Rep. 2017 Jan 3;18(1):237-247
pubmed: 28052253
J Histochem Cytochem. 2021 Feb;69(2):105-119
pubmed: 33494649
Osteoarthritis Cartilage. 2022 Jan;30(1):137-146
pubmed: 34547431
J Am Soc Mass Spectrom. 2015 Nov;26(11):1865-74
pubmed: 26115965
Proteomics. 2020 Nov;20(21-22):e1900335
pubmed: 32939979
Brain Nerve. 2019 Jan;71(1):45-55
pubmed: 30630129
Elife. 2021 Apr 06;10:
pubmed: 33821792
J Proteome Res. 2018 Jan 5;17(1):600-617
pubmed: 29160079
J Proteome Res. 2020 May 1;19(5):2026-2034
pubmed: 32126768
Proteomics. 2013 Jan;13(1):22-4
pubmed: 23148064
mSystems. 2021 Oct 26;6(5):e0027121
pubmed: 34581598
J Proteome Res. 2014 Aug 1;13(8):3679-84
pubmed: 24909410
Nat Methods. 2019 Jan;16(1):63-66
pubmed: 30573815
Bioinformatics. 2019 Jul 15;35(14):i183-i190
pubmed: 31510687
Nat Methods. 2020 Jul;17(7):665-680
pubmed: 32483333
Mol Cell Proteomics. 2019 Apr;18(4):773-785
pubmed: 30622160
J Proteome Res. 2004 Sep-Oct;3(5):958-64
pubmed: 15473683
Nature. 2016 Sep 14;537(7620):347-55
pubmed: 27629641
J Proteome Res. 2017 Jan 6;16(1):45-54
pubmed: 27779884
Proteomics. 2009 Mar;9(5):1220-9
pubmed: 19253293
J Big Data. 2021;8(1):53
pubmed: 33816053
J Proteome Res. 2021 Apr 2;20(4):1986-1996
pubmed: 33514075
Nat Commun. 2019 Jun 21;10(1):2727
pubmed: 31227708
Curr Opin Chem Biol. 2008 Oct;12(5):483-90
pubmed: 18718552
J Proteome Res. 2013 Feb 1;12(2):615-25
pubmed: 23272783