Predicting glycan structure from tandem mass spectrometry via deep learning.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
01 Jul 2024
01 Jul 2024
Historique:
received:
13
06
2023
accepted:
17
05
2024
medline:
2
7
2024
pubmed:
2
7
2024
entrez:
1
7
2024
Statut:
aheadofprint
Résumé
Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography-MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb . We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.
Identifiants
pubmed: 38951670
doi: 10.1038/s41592-024-02314-6
pii: 10.1038/s41592-024-02314-6
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Vetenskapsrådet (Swedish Research Council)
ID : BioMS
Organisme : Vetenskapsrådet (Swedish Research Council)
ID : BioMS
Organisme : Vetenskapsrådet (Swedish Research Council)
ID : BioMS
Organisme : Science Foundation Ireland (SFI)
ID : 20/FFP-P/8809
Informations de copyright
© 2024. The Author(s).
Références
Guo, Y., Jia, W., Yang, J. & Zhan, X. Cancer glycomics offers potential biomarkers and therapeutic targets in the framework of 3P medicine. Front. Endocrinol. 13, 970489 (2022).
Cvetko, A. et al. Plasma N-glycome shows continuous deterioration as the diagnosis of insulin resistance approaches. BMJ Open Diabetes Res. Care 9, e002263 (2021).
pubmed: 34518155
pmcid: 8438737
Varki, A. Biological roles of glycans. Glycobiology 27, 3–49 (2017).
pubmed: 27558841
Zhao, C. & Pu, J. Influence of host sialic acid receptors structure on the host specificity of influenza viruses. Viruses 14, 2141 (2022).
pubmed: 36298694
pmcid: 9608321
Rudd, P. M. et al. in Essentials of Glycobiology (eds Varki, A. et al.) Ch. 51 (Cold Spring Harbor Laboratory Press, 2022).
Bao, B. et al. Correcting for sparsity and interdependence in glycomics by accounting for glycan biosynthesis. Nat. Commun. 12, 4988 (2021).
pubmed: 34404781
pmcid: 8371009
Thomès, L., Karlsson, V., Lundstrøm, J. & Bojar, D. Mammalian milk glycomes: connecting the dots between evolutionary conservation and biosynthetic pathways. Cell Rep. 42, 112710 (2023).
pubmed: 37379211
Everest-Dass, A. V., Abrahams, J. L., Kolarich, D., Packer, N. H. & Campbell, M. P. Structural feature ions for distinguishing N- and O-linked glycan isomers by LC-ESI-IT MS/MS. J. Am. Soc. Mass Spectrom. 24, 895–906 (2013).
pubmed: 23605685
Veillon, L. et al. Characterization of isomeric glycan structures by LC-MS/MS: liquid phase separations. Electrophoresis 38, 2100–2114 (2017).
pubmed: 28370073
pmcid: 5581235
Trbojević-Akmačić, I. et al. High-throughput glycomic methods. Chem. Rev. 122, 15865–15913 (2022).
pubmed: 35797639
pmcid: 9614987
Harvey, D. J. Analysis of carbohydrates and glycoconjugates by matrix‐assisted laser desorption/ionization mass spectrometry: an update for 2019–2020. Mass Spectrom. Rev. https://doi.org/10.1002/mas.21806 (2023).
doi: 10.1002/mas.21806
pubmed: 36468275
Chai, W., Piskarev, V. & Lawson, A. M. Negative-ion electrospray mass spectrometry of neutral underivatized oligosaccharides. Anal. Chem. 73, 651–657 (2001).
pubmed: 11217777
Yu, J. et al. Distinctive MS/MS fragmentation pathways of glycopeptide-generated oxonium ions provide evidence of the glycan structure. Chemistry 22, 1114–1124 (2016).
pubmed: 26663535
De Leoz, M. L. A., Simón-Manso, Y., Woods, R. J. & Stein, S. E. Cross-ring fragmentation patterns in the tandem mass spectra of underivatized sialylated oligosaccharides and their special suitability for spectrum library searching. J. Am. Soc. Mass Spectrom. 30, 426–438 (2019).
pubmed: 30565163
Li, W., Hou, C., Li, Y., Wu, C. & Ma, J. HexNAcQuest: a tool to distinguish O-GlcNAc and O-GalNAc. J. Am. Soc. Mass Spectrom. 33, 2008–2012 (2022).
pubmed: 36122299
Toghi Eshghi, S. et al. Classification of tandem mass spectra for identification of N- and O-linked glycopeptides. Sci. Rep. 6, 37189 (2016).
pubmed: 27869200
pmcid: 5116676
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature https://doi.org/10.1038/s41586-021-03819-2 (2021).
doi: 10.1038/s41586-021-03819-2
pubmed: 34293799
pmcid: 8387240
Horlacher, O. et al. Glycoforest 1.0. Anal. Chem. 89, 10932–10940 (2017).
pubmed: 28901741
Chen, Z. et al. GlycoDeNovo2: an improved MS/MS-based de novo glycan topology reconstruction algorithm. J. Am. Soc. Mass Spectrom. 33, 436–445 (2022).
pubmed: 35157458
pmcid: 9149727
Kumozaki, S., Sato, K. & Sakakibara, Y. A machine learning based approach to de novo sequencing of glycans from tandem mass spectrometry spectrum. IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 1267–1274 (2015).
pubmed: 26671799
Ceroni, A. et al. GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. J. Proteome Res. 7, 1650–1659 (2008).
pubmed: 18311910
Klein, J. & Zaia, J. glypy: an open source glycoinformatics library. J. Proteome Res. 18, 3532–3537 (2019).
pubmed: 31310539
pmcid: 7158751
Vakhrushev, S. Y., Dadimov, D. & Peter-Katalinić, J. Software platform for high-throughput glycomics. Anal. Chem. 81, 3252–3260 (2009).
pubmed: 19341273
Yilmaz, M. et al. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. Preprint at bioRxiv https://doi.org/10.1101/2023.01.03.522621 (2023).
Altenburg, T., Giese, S. H., Wang, S., Muth, T. & Renard, B. Y. Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides. Nat. Mach. Intell. 4, 378–388 (2022).
Bojar, D., Powers, R. K., Camacho, D. M. & Collins, J. J. Deep-learning resources for studying glycan-mediated host-microbe interactions. Cell Host Microbe 29, 132–144.e3 (2021).
pubmed: 33120114
Jin, C. et al. Structural diversity of human gastric mucin glycans. Mol. Cell. Proteom. 16, 743–758 (2017).
Jin, C., Lundstrom, J., Korhonen, E., Luis, A. S. & Bojar, D. Breast milk oligosaccharides contain immunomodulatory glucuronic acid and LacdiNAc. Mol. Cell. Proteomics. 22, 100635 (2023).
pubmed: 37597722
pmcid: 10509713
Thomès, L., Burkholz, R. & Bojar, D. Glycowork: a Python package for glycan data science and machine learning. Glycobiology https://doi.org/10.1093/glycob/cwab067 (2021).
Domon, B. & Costello, C. E. A systematic nomenclature for carbohydrate fragmentations in FAB-MS/MS spectra of glycoconjugates. Glycoconj. J. 5, 397–409 (1988).
Hayes, C. A. et al. UniCarb-DB: a database resource for glycomic discovery. Bioinformatics 27, 1343–1344 (2011).
pubmed: 21398669
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. Preprint at http://arxiv.org/abs/1706.04599 (2017).
Seymour, J. L., Costello, C. E. & Zaia, J. The influence of sialylation on glycan negative ion dissociation and energetics. J. Am. Soc. Mass Spectrom. 17, 844–854 (2006).
pubmed: 16603372
pmcid: 2586975
Coff, L. et al. Profiling the glycome of Cardicola forsteri, a blood fluke parasitic to bluefin tuna. Int. J. Parasitol. 52, 1–12 (2022).
pubmed: 34391752
Kawahara, R. et al. Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis. Nat. Methods 18, 1304–1316 (2021).
pubmed: 34725484
pmcid: 8566223
Blöchl, C. et al. Integrated N- and O-glycomics of acute myeloid leukemia (AML) cell lines. Cells 10, 3058 (2021).
pubmed: 34831278
pmcid: 8616353
Madunić, K. et al. O-glycomic and proteomic signatures of spontaneous and butyrate-stimulated colorectal cancer cell line differentiation. Mol. Cell. Proteom. 22, 100501 (2023).
Russo, F. & Angelini, C. RNASeqGUI: a GUI for analysing RNA-seq data. Bioinformatics 30, 2514–2516 (2014).
pubmed: 24812338
Malm, E. K., Srivastava, V., Sundqvist, G. & Bulone, V. APP: an Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools. BMC Bioinformatics 15, 441 (2014).
pubmed: 25547515
pmcid: 4314934
Watanabe, Y., Aoki-Kinoshita, K. F., Ishihama, Y. & Okuda, S. GlycoPOST realizes FAIR principles for glycomics mass spectrometry data. Nucleic Acids Res. 49, D1523–D1528 (2021).
pubmed: 33174597
Laughlin, S. T. & Bertozzi, C. R. Metabolic labeling of glycans with azido sugars and subsequent glycan-profiling and visualization via Staudinger ligation. Nat. Protoc. 2, 2930–2944 (2007).
pubmed: 18007630
Murphy, M. et al. Efficiently predicting high resolution mass spectra with graph neural networks. Preprint at https://arxiv.org/abs/2301.11419 (2023).
Urban, J. et al. Predicting glycan structure from tandem mass spectrometry via deep learning. Zenodo https://doi.org/10.5281/zenodo.10029271 (2024).
Kouka, T. et al. Computational modeling of O-linked glycan biosynthesis in CHO cells. Molecules 27, 1766 (2022).
pubmed: 35335136
pmcid: 8950484
Lundstrøm, J., Urban, J., Thomès, L. & Bojar, D. GlycoDraw: a python implementation for generating high-quality glycan figures. Glycobiology 33, 927–934 (2023).
pubmed: 37498172
pmcid: 10859633
Ankerst, M., Breunig, M. M., Kriegel, H.-P. & Sander, J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 28, 49–60 (1999).
Adusumilli, R. & Mallick, P. in Proteomics Vol. 1550 (eds Comai, L. et al) 339–368 (Springer, 2017).
Kösters, M. et al. pymzML v2.0: introducing a highly compressed and seekable gzip format. Bioinformatics 34, 2513–2514 (2018).
pubmed: 29394323
Levitsky, L. I., Klein, J. A., Ivanov, M. V. & Gorshkov, M. V. Pyteomics 4.0: five years of development of a python proteomics framework. J. Proteome Res. 18, 709–714 (2019).
pubmed: 30576148
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Preprint at https://arxiv.org/abs/1912.01703 (2019).
Kwon, J., Kim, J., Park, H. & Choi, I. K. ASAM: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. Preprint at https://arxiv.org/abs/2102.11600 (2021).
Huber, F., van der Burg, S., van der Hooft, J. J. J. & Ridder, L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J. Cheminform. 13, 84 (2021).
pubmed: 34715914
pmcid: 8556919
Leng, Z. et al. PolyLoss: a polynomial expansion perspective of classification loss functions. Preprint at https://arxiv.org/abs/2204.12511 (2022).
Tiemeyer, M. et al. GlyTouCan: an accessible glycan structure repository. Glycobiology 27, 915–919 (2017).
pubmed: 28922742
pmcid: 5881658
Wernicke, S. in Algorithms in Bioinformatics Vol. 3692 (eds Casadio, R. & Myers, G.) 165–177 (Springer, 2005).
GLYCAM-Web (Complex Carbohydrate Research Center, University of Georgia, 2005).
Kirschner, K. N. et al. GLYCAM06: a generalizable biomolecular force field. Carbohydrates. J. Comput. Chem. 29, 622–655 (2008).
pubmed: 17849372
pmcid: 4423547
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
pubmed: 15116359
Sousa Da Silva, A. W. & Vranken, W. F. ACPYPE—AnteChamber PYthon Parser interfacE. BMC Res. Notes 5, 367 (2012).
pubmed: 22824207
pmcid: 3461484
Abraham, M. J. et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015).