A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization.
Bayes Theorem
Cohort Studies
Computer Simulation
Genetic Predisposition to Disease
Genetic Testing
/ methods
Genetic Variation
Genome, Human
Genomics
/ methods
High-Throughput Nucleotide Sequencing
/ methods
Humans
Logistic Models
Machine Learning
Neoplasms
/ diagnosis
Practice Guidelines as Topic
Research Design
Software
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
15 02 2022
15 02 2022
Historique:
received:
03
03
2021
accepted:
07
01
2022
entrez:
16
2
2022
pubmed:
17
2
2022
medline:
15
3
2022
Statut:
epublish
Résumé
Genomic variant interpretation is a critical step of the diagnostic procedure, often supported by the application of tools that may predict the damaging impact of each variant or provide a guidelines-based classification. We propose the application of Machine Learning methodologies, in particular Penalized Logistic Regression, to support variant classification and prioritization. Our approach combines ACMG/AMP guidelines for germline variant interpretation as well as variant annotation features and provides a probabilistic score of pathogenicity, thus supporting the prioritization and classification of variants that would be interpreted as uncertain by the ACMG/AMP guidelines. We compared different approaches in terms of variant prioritization and classification on different datasets, showing that our data-driven approach is able to solve more variant of uncertain significance (VUS) cases in comparison with guidelines-based approaches and in silico prediction tools.
Identifiants
pubmed: 35169226
doi: 10.1038/s41598-022-06547-3
pii: 10.1038/s41598-022-06547-3
pmc: PMC8847497
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2517Informations de copyright
© 2022. The Author(s).
Références
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. Off. J. Am. Coll. Med. Genet. 17, 405–424 (2015).
Mahamdallie, S. et al. The ICR639 CPG NGS validation series: A resource to assess analytical sensitivity of cancer predisposition gene testing. Wellcome Open Res. 3, 68 (2018).
pubmed: 30175241
pmcid: 6081973
doi: 10.12688/wellcomeopenres.14594.1
Gunning, A. C. et al. Assessing performance of pathogenicity predictors using clinically-relevant variant datasets. bioRxiv 2020.02.06.937169. https://doi.org/10.1101/2020.02.06.937169 (2020).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using polyPhen-2. Curr. Protoc. Hum. Genet. Ed. Board Jonathan Haines Al 07, Unit 7.20 (2013).
Limongelli, I., Marini, S. & Bellazzi, R. PaPI: Pseudo amino acid composition to score human protein-coding variants. BMC Bioinform. 16, 123 (2015).
doi: 10.1186/s12859-015-0554-8
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
pubmed: 30371827
doi: 10.1093/nar/gky1016
Hu, Z. et al. VIPdb, a genetic variant impact predictor database. Hum. Mutat. 40, 1202–1214 (2019).
pubmed: 31283070
pmcid: 7288905
doi: 10.1002/humu.23858
Niroula, A. & Vihinen, M. How good are pathogenicity predictors in detecting benign variants?. PLOS Comput. Biol. 15, e1006481 (2019).
pubmed: 30742610
pmcid: 6386394
doi: 10.1371/journal.pcbi.1006481
Ernst, C. et al. Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Med. Genomics 11, 35 (2018).
pubmed: 29580235
pmcid: 5870501
doi: 10.1186/s12920-018-0353-y
Niehaus, A. et al. A survey assessing adoption of the ACMG-AMP guidelines for interpreting sequence variants and identification of areas for continued improvement. Genet. Med. Off. J. Am. Coll. Med. Genet. 21, 1699–1701 (2019).
Luo, X. et al. ClinGen myeloid malignancy variant curation expert panel recommendations for germline RUNX1 variants. Blood Adv. 3, 2962–2979 (2019).
pubmed: 31648317
pmcid: 6849945
doi: 10.1182/bloodadvances.2019000644
Mester, J. L. et al. Gene-specific criteria for PTEN variant curation: Recommendations from the ClinGen PTEN Expert Panel. Hum. Mutat. 39, 1581–1592 (2018).
pubmed: 30311380
pmcid: 6329583
doi: 10.1002/humu.23636
Kelly, M. A. et al. Adaptation and validation of the ACMG/AMP variant classification framework for MYH7 -associated inherited cardiomyopathies: recommendations by ClinGen’s inherited cardiomyopathy expert panel. Genet. Med. 20, 351–359 (2018).
pubmed: 29300372
pmcid: 5876064
doi: 10.1038/gim.2017.218
Li, Q. & Wang, K. InterVar: Clinical interpretation of genetic variants by the 2015 ACMG-AMP Guidelines. Am. J. Hum. Genet. 100, 267–280 (2017).
pubmed: 28132688
pmcid: 5294755
doi: 10.1016/j.ajhg.2017.01.004
Ravichandran, V. et al. Toward automation of germline variant curation in clinical cancer genetics. Genet. Med. 21, 2116–2125 (2019).
pubmed: 30787465
pmcid: 6703969
doi: 10.1038/s41436-019-0463-8
Xavier, A., Scott, R. J. & Talseth-Palmer, B. A. TAPES: A tool for assessment and prioritisation in exome studies. PLOS Comput. Biol. 15, e1007453 (2019).
pubmed: 31613886
pmcid: 6814239
doi: 10.1371/journal.pcbi.1007453
Dahary, D. et al. Genome analysis and knowledge-driven variant interpretation with TGex. BMC Med. Genomics 12, 1–17 (2019).
doi: 10.1186/s12920-019-0647-8
Whiffin, N. et al. CardioClassifier: Disease- and gene-specific computational decision support for clinical genome interpretation. Genet. Med. 20, 1246–1254 (2018).
pubmed: 29369293
pmcid: 6558251
doi: 10.1038/gim.2017.258
Nicora, G. et al. CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum. Mutat. 39, 1835–1846 (2018).
pubmed: 30298955
doi: 10.1002/humu.23665
Scott, A. D. et al. CharGer: Clinical characterization of germline variants. Bioinform. Oxf. Engl. 35, 865–867 (2019).
doi: 10.1093/bioinformatics/bty649
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. Off. J. Am. Coll. Med. Genet. 20, 1054–1060 (2018).
Bennett, J. S. et al. Reclassification of variants of uncertain significance in children with inherited arrhythmia syndromes is predicted by clinical factors. Pediatr. Cardiol. 40, 1679–1687 (2019).
pubmed: 31535183
doi: 10.1007/s00246-019-02203-2
Ana, M. & Hershberger Ray, E. Variants of uncertain significance. Circ. Genomic Precis. Med. 11, e002169 (2018).
Landrum, M. J. et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
pubmed: 24234437
doi: 10.1093/nar/gkt1113
Sarkar, A., Yang, Y. & Vihinen, M. Variation benchmark datasets: Update, criteria, quality and applications. bioRxiv 634766. https://doi.org/10.1101/634766 (2019).
Lai, C. et al. LEAP: Using machine learning to support variant classification in a clinical setting. Hum. Mutat. 41, 1079–1090 (2020).
pubmed: 32176384
pmcid: 7317941
doi: 10.1002/humu.24011
Alirezaie, N., Kernohan, K. D., Hartley, T., Majewski, J. & Hocking, T. D. ClinPred: Prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am. J. Hum. Genet. 103, 474–483 (2018).
pubmed: 30220433
pmcid: 6174354
doi: 10.1016/j.ajhg.2018.08.005
do Nascimento, P. M., Medeiros, I. G., Falcão, R. M., Stransky, B. & de Souza, J. E. S. A decision tree to improve identification of pathogenic mutations in clinical practice. BMC Med. Inform. Decis. Mak. 20, 52 (2020).
pubmed: 32151256
pmcid: 7063785
doi: 10.1186/s12911-020-1060-0
Ritchie, G. R. & Flicek, P. Computational approaches to interpreting genomic sequence variation. Genome Med. 6, 87 (2014).
pubmed: 25473426
pmcid: 4254438
doi: 10.1186/s13073-014-0087-1
Li, Q., Zhao, K., Bustamante, C. D., Ma, X. & Wong, W. H. Xrare: A machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet. Med. 21, 2126–2134 (2019).
pubmed: 30675030
pmcid: 6752318
doi: 10.1038/s41436-019-0439-8
Flygare, S. et al. The VAAST variant prioritizer (VVP): Ultrafast, easy to use whole genome variant prioritization tool. BMC Bioinform. 19, 57 (2018).
doi: 10.1186/s12859-018-2056-y
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
pubmed: 24487276
pmcid: 3992975
doi: 10.1038/ng.2892
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, 2001).
doi: 10.1007/978-0-387-21606-5
Dreiseitl, S. & Ohno-Machado, L. Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inform. 35, 352–359 (2002).
pubmed: 12968784
doi: 10.1016/S1532-0464(03)00034-0
Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
Brier, G. W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950).
doi: 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
Zou, Q., Xie, S., Lin, Z., Wu, M. & Ju, Y. Finding the best classification threshold in imbalanced classification. Big Data Res. 5, 2–8 (2016).
doi: 10.1016/j.bdr.2015.12.001
Chinchor, N. MUC-4 evaluation metrics. in Proceedings of the 4th Conference on Message Understanding. 22–29. https://doi.org/10.3115/1072064.1072067 (Association for Computational Linguistics, 1992).
Handschuh, L., Wojciechowski, P., Kazmierczak, M. & Lewandowski, K. Transcript-level dysregulation of BCL2 family genes in acute myeloblastic leukemia. Cancers 13, 3175 (2021).
pubmed: 34202143
pmcid: 8267690
doi: 10.3390/cancers13133175
Agakidou, E. et al. A novel mutation of VPS33B gene associated with incomplete arthrogryposis renal dysfunction-cholestasis phenotype. Case Rep. Genet. 2020, 8872294 (2020).
pubmed: 33029437
pmcid: 7532373
Antonaci, F. et al. Familial hemiplegic migraine type 2 due to a novel missense mutation in ATP1A2. J. Headache Pain 22, 1–6 (2021).
doi: 10.1186/s10194-021-01221-x
Cristina, T.-P. et al. A genetic analysis of a Spanish population with early onset Parkinson’s disease. PLoS ONE 15, e0238098 (2020).
pubmed: 32870915
pmcid: 7462269
doi: 10.1371/journal.pone.0238098
Tesolin, P. et al. Non-syndromic autosomal dominant hearing loss: The first Italian family carrying a mutation in the NCOA3 gene. Genes 12, 1043 (2021).
pubmed: 34356059
pmcid: 8304864
doi: 10.3390/genes12071043
McSherry, F. & Najork, M. Computing information retrieval performance measures efficiently in the presence of tied scores. in Advances in Information Retrieval (eds. Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I. & White, R. W.). 414–421. https://doi.org/10.1007/978-3-540-78646-7_38 (Springer, 2008).
Andreoletti, G., Pal, L. R., Moult, J. & Brenner, S. E. Reports from the fifth edition of CAGI: The critical assessment of genome interpretation. Hum. Mutat. 40, 1197–1201 (2019).
Grimm, D. G. et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36, 513–523 (2015).
pubmed: 25684150
pmcid: 4409520
doi: 10.1002/humu.22768