Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.


Journal

Human genomics
ISSN: 1479-7364
Titre abrégé: Hum Genomics
Pays: England
ID NLM: 101202210

Informations de publication

Date de publication:
27 Aug 2024
Historique:
received: 15 06 2023
accepted: 08 08 2024
medline: 28 8 2024
pubmed: 28 8 2024
entrez: 27 8 2024
Statut: epublish

Résumé

We describe the machine learning tool that we applied in the CAGI 6 experiment to predict whether single residue mutations in proteins are deleterious or benign. This tool was trained using only single sequences, i.e., without multiple sequence alignments or structural information. Instead, we used global characterizations of the protein sequence. Training and testing data for human gene mutations was obtained from ClinVar (ncbi.nlm.nih.gov/pub/ClinVar/), and for non-human gene mutations from Uniprot (www.uniprot.org). Testing was done on post-training data from ClinVar. This testing yielded high AUC and Matthews correlation coefficient (MCC) for well trained examples but low generalizability. For genes with either sparse or unbalanced training data, the prediction accuracy is poor. The resulting prediction server is available online at http://www.mamiris.com/Shoni.cagi6.

Identifiants

pubmed: 39192324
doi: 10.1186/s40246-024-00655-z
pii: 10.1186/s40246-024-00655-z
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

89

Subventions

Organisme : NIH HHS
ID : R01HG012117
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Chin IS, Khan A, Olsson-Brown A, Papa S, Middleton G, Palles C. Germline genetic variation and predicting immune checkpoint inhibitor induced toxicity. npj Genomic Med. 2022;7(1):73.
doi: 10.1038/s41525-022-00345-6
Keller J, Gomez R, Williams G, Lembke A, Lazzeroni L, Murphy GM, Schatzberg AF. HPA axis in major depression: cortisol, clinical symptomatology and genetic variation predict cognition. Mol Psychiatry. 2017;22(4):527–36.
pubmed: 27528460 doi: 10.1038/mp.2016.120
Battey CJ, Ralph PL, Kern AD. Predicting geographic location from genetic variation with deep neural networks. Elife. 2020;9: e54507.
pubmed: 32511092 pmcid: 7324158 doi: 10.7554/eLife.54507
Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, Turley P, Chen G-B, Valur Emilsson S, Meddens FW, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533(7604):539–42.
pubmed: 27225129 pmcid: 4883595 doi: 10.1038/nature17671
Marioni RE, Ritchie SJ, Joshi PK, Hagenaars SP, Okbay A, Fischer K, Adams MJ, Hill WD, Davies G, Social Science Genetic Association Consortium, et al. Genetic variants linked to education predict longevity. Proc Natl Acad Sci. 2016;113(47):13366–71.
Cheng J, Nguyen TYD, Cygan KJ, Çelik MH, Fairbrother WG, Gagneur J, et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 2019;20(1):1–15.
doi: 10.1186/s13059-019-1653-z
Davies RW, Fiksinski AM, Breetvelt EJ, Williams NM, Hooper SR, Monfeuga T, Bassett AS, Owen MJ, Gur RE, Morrow BE, et al. Using common genetic variation to examine phenotypic expression and risk prediction in 22q11. 2 deletion syndrome. Nat Med. 2020;26(12):1912–8.
pubmed: 33169016 pmcid: 7975627 doi: 10.1038/s41591-020-1103-1
Trépo E, Valenti L. Update on NAFLD genetics: from new variants to the clinic. J Hepatol. 2020;72(6):1196–209.
pubmed: 32145256 doi: 10.1016/j.jhep.2020.02.020
Bouafi H, Bencheikh S, Mehdi Krami AL, Morjane I, Charoute H, Rouba H, Saile R, Benhnini F, Barakat A. Prediction and structural comparison of deleterious coding nonsynonymous single nucleotide polymorphisms (nsSNPs) in human LEP gene associated with obesity. BioMed Res Int. 2019;2019:1832084.
pubmed: 31871931 pmcid: 6913293 doi: 10.1155/2019/1832084
Genome Interpretation Consortium et al. Cagi, the critical assessment of genome interpretation, establishes progress and prospects for computational genetic variant interpretation methods. arXiv e-prints, pages arXiv:2205 , 2022.
Cagi. The critical assessment of genome interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol. 2024;25(1):53.
doi: 10.1186/s13059-023-03113-6
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction(CASP)-Round XIV. Proteins Struct Funct Bioinform. 2021;89(12):1607–17.
doi: 10.1002/prot.26237
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.
pubmed: 34265844 pmcid: 8371605 doi: 10.1038/s41586-021-03819-2
Baek M, Baker D. Deep learning and protein structure modeling. Nat Methods. 2022;19(1):13–4.
pubmed: 35017724 doi: 10.1038/s41592-021-01360-8
Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics. 2005;21(12):2814–20.
pubmed: 15827081 doi: 10.1093/bioinformatics/bti442
Bao L, Cui Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics. 2005;21(10):2185–90.
pubmed: 15746281 doi: 10.1093/bioinformatics/bti365
Dobson RJ, Munroe PB, Caulfield MJ, Saqi MAS. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes. BMC Bioinform. 2006;7(1):217.
doi: 10.1186/1471-2105-7-217
Ng PC, Henikoff S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet. 2006;7:61–80.
pubmed: 16824020 doi: 10.1146/annurev.genom.7.080505.115630
Care MA, Needham CJ, Bulpitt AJ, Westhead DR. Deleterious SNP prediction: be mindful of your training data! Bioinformatics. 2007;23(6):664–72.
pubmed: 17234639 doi: 10.1093/bioinformatics/btl649
Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12(9):628–40.
pubmed: 21850043 doi: 10.1038/nrg3046
Tian J, Ningfeng W, Guo X, Guo J, Zhang J, Fan Y. Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinform. 2007;8(1):450.
doi: 10.1186/1471-2105-8-450
Teng S, Michonova-Alexova E, Alexov E. Approaches and resources for prediction of the effects of non-synonymous single nucleotide polymorphism on protein function and interactions. Curr Pharm Biotechnol. 2008;9(2):123–33.
pubmed: 18393868 doi: 10.2174/138920108783955164
Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat Protoc. 2009;4(7):1073.
pubmed: 19561590 doi: 10.1038/nprot.2009.86
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9.
pubmed: 20354512 pmcid: 2855889 doi: 10.1038/nmeth0410-248
Huang T, Wang P, Ye Z-Q, Heng X, He Z, Feng K-Y, LeLe H, Cui WR, Wang K, Dong X, et al. Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. PLoS ONE. 2010;5(7): e11900.
pubmed: 20689580 pmcid: 2912763 doi: 10.1371/journal.pone.0011900
Capriotti E, Altman RB. Improving the prediction of disease-related variants using protein three-dimensional structure. BMC Bioinform. 2011;12(S4):S3.
doi: 10.1186/1471-2105-12-S4-S3
Capriotti E, Altman RB. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. Genomics. 2011;98(4):310–7.
pubmed: 21763417 doi: 10.1016/j.ygeno.2011.06.010
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE. 2012;7(10): e46688.
pubmed: 23056405 pmcid: 3466303 doi: 10.1371/journal.pone.0046688
Lopes MC, Joyce C, Ritchie GRS, John SL, Cunningham F, Asimit J, Zeggini E. A combined functional annotation score for non-synonymous variants. Hum Hered. 2012;73(1):47–51.
pubmed: 22261837 doi: 10.1159/000334984
Wu J, Jiang R. Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases. Sci World J. 2013;2013: 675851.
doi: 10.1155/2013/675851
Dakal TC, Kala D, Dhiman G, Yadav V, Krokhotin A, Dokholyan NV. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms in il8 gene. Sci Rep. 2017;7(1):1–18.
doi: 10.1038/s41598-017-06575-4
Desai M, Chauhan JB. Computational analysis for the determination of deleterious nsSNPs in human MTHFR gene. Comput Biol Chem. 2018;74:20–30.
pubmed: 29524840 doi: 10.1016/j.compbiolchem.2018.02.022
Desai M, Chauhan JB. Predicting the functional and structural consequences of nsSNPs in human methionine synthase gene using computational tools. Syst Biol Reprod Med. 2019;65(4):288–300.
pubmed: 30676783 doi: 10.1080/19396368.2019.1568611
Ponzoni L, Peñaherrera DA, Oltvai ZN, Bahar I. Rhapsody: predicting the pathogenicity of human missense variants. Bioinformatics. 2020;36(10):3084–92.
pubmed: 32101277 pmcid: 7214033 doi: 10.1093/bioinformatics/btaa127
Peng Y, Alexov E. Investigating the linkage between disease-causing amino acid variants and their effect on protein stability and binding. Proteins Struct Funct Bioinform. 2016;84(2):232–9.
doi: 10.1002/prot.24968
Tang H, Thomas PD. Tools for predicting the functional impact of nonsynonymous genetic variation. Genetics. 2016;203(2):635–47.
pubmed: 27270698 pmcid: 4896183 doi: 10.1534/genetics.116.190033
Van Rappard DF, Boelens JJ, Wolf NI. Metachromatic leukodystrophy: disease spectrum and approaches for treatment. Best Pract Res Clin Endocrinol Metab. 2015;29(2):261–73.
pubmed: 25987178 doi: 10.1016/j.beem.2014.10.001
Faraggi E, Zhou Y, Kloczkowski A. Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins Struct Funct Bioinform. 2014;82(11):3170–6.
doi: 10.1002/prot.24682
Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci. 1984;81(1):140–4.
pubmed: 6582470 pmcid: 344626 doi: 10.1073/pnas.81.1.140
Orlin Ch Ivanov and Berthold Förtsch. Universal regularities in protein primary structure: preference in bonding and periodicity. Orig Life Evol Biosph. 1986;17(1):35–49.
doi: 10.1007/BF01809811
Rackovsky S. “hidden’’ sequence periodicities and protein architecture. Proc Natl Acad Sci. 1998;95(15):8580–4.
pubmed: 9671720 pmcid: 21118 doi: 10.1073/pnas.95.15.8580
Marsella L, Sirocco F, Trovato A, Seno F, Tosatto SCE. Repetita: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform. Bioinformatics. 2009;25(12):i289–95.
pubmed: 19478001 pmcid: 2687986 doi: 10.1093/bioinformatics/btp232
Rackovsky S. Global characteristics of protein sequences and their implications. Proc Natl Acad Sci. 2010;107(19):8623–6.
pubmed: 20421501 pmcid: 2889366 doi: 10.1073/pnas.1001299107
Rackovsky S. Sequence determinants of protein architecture. Proteins Struct Funct Bioinform. 2013;81(10):1681–5.
doi: 10.1002/prot.24328
Scheraga HA, Rackovsky S. Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences. Proc Natl Acad Sci. 2014;111(14):5225–9.
pubmed: 24706836 pmcid: 3986189 doi: 10.1073/pnas.1403599111
Meiler J, Müller M, Zeidler A, Schmäschke F. Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. Mol Model Annu. 2001;7(9):360–9.
doi: 10.1007/s008940100038
Zhou Y, Faraggi E. Prediction of one-dimensional structural properties of proteins by integrated neural networks. In: Rangwala H, Karypis G, editors. Introduction to protein structure prediction: methods and algorithms. Hoboken: Wiley; 2010. p. 45–74.
doi: 10.1002/9780470882207.ch4
Faraggi E, Kloczkowski A. Genn: a general neural network for learning tabulated data with examples from protein structure prediction. In: Artificial Neural Networks. Berlin: Springer; 2015. p. 165–78.
doi: 10.1007/978-1-4939-2239-0_10
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. Clinvar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42(D1):D980–5.
pubmed: 24234437 doi: 10.1093/nar/gkt1113
Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Baoshan G, Hart J, Hoffman D, Hoover J, et al. Clinvar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8.
pubmed: 26582918 doi: 10.1093/nar/gkv1222
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Baoshan G, Hart J, Hoffman D, Jang W, et al. Clinvar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7.
pubmed: 29165669 doi: 10.1093/nar/gkx1153
Landrum MJ, Chitipiralla S, Brown GR, Chen C, Baoshan G, Hart J, Hoffman D, Jang W, Kaur K, Liu C, et al. Clinvar: improvements to accessing data. Nucleic Acids Res. 2020;48(D1):D835–44.
pubmed: 31777943 doi: 10.1093/nar/gkz972
Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, et al. The SWISS-PROT protein knowledgebase and its supplement trEMBL in 2003. Nucleic Acids Res. 2003;31(1):365–70.
pubmed: 12520024 pmcid: 165542 doi: 10.1093/nar/gkg095
UniProt Consortium. Uniprot: the universal protein knowledgebase in 2021. Nucleic acids research. 2021;49(D1):D480–9.
Choi Y. A fast computation of pairwise sequence alignment scores between a protein and a set of single-locus variants of another protein. In: Proceedings of the ACM conference on bioinformatics, computational biology and biomedicine. 2012. pp. 414–417.
Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using polyphen-2. Curr Protoc Hum Genet. 2013;76(1):7–20.

Auteurs

Eshel Faraggi (E)

Research and Information Systems, LLC, 1620 E. 72nd ST., Indianapolis, IN, 46240, USA. efaraggi@gmail.com.
Physics Department, Indiana University Purdue University Indianapolis, Indianapolis, IN, 46202, USA. efaraggi@gmail.com.

Robert L Jernigan (RL)

Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, 50011, USA.

Andrzej Kloczkowski (A)

The Steve and Cindy Rasmussen Institute for Genomic Medicine, Columbus, OH, 43205, USA.
Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, 43205, USA.
Department of Pediatrics, The Ohio State University, Columbus, OH, 43205, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH