A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
15 02 2022
Historique:
received: 03 03 2021
accepted: 07 01 2022
entrez: 16 2 2022
pubmed: 17 2 2022
medline: 15 3 2022
Statut: epublish

Résumé

Genomic variant interpretation is a critical step of the diagnostic procedure, often supported by the application of tools that may predict the damaging impact of each variant or provide a guidelines-based classification. We propose the application of Machine Learning methodologies, in particular Penalized Logistic Regression, to support variant classification and prioritization. Our approach combines ACMG/AMP guidelines for germline variant interpretation as well as variant annotation features and provides a probabilistic score of pathogenicity, thus supporting the prioritization and classification of variants that would be interpreted as uncertain by the ACMG/AMP guidelines. We compared different approaches in terms of variant prioritization and classification on different datasets, showing that our data-driven approach is able to solve more variant of uncertain significance (VUS) cases in comparison with guidelines-based approaches and in silico prediction tools.

Identifiants

pubmed: 35169226
doi: 10.1038/s41598-022-06547-3
pii: 10.1038/s41598-022-06547-3
pmc: PMC8847497
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2517

Informations de copyright

© 2022. The Author(s).

Références

Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. Off. J. Am. Coll. Med. Genet. 17, 405–424 (2015).
Mahamdallie, S. et al. The ICR639 CPG NGS validation series: A resource to assess analytical sensitivity of cancer predisposition gene testing. Wellcome Open Res. 3, 68 (2018).
pubmed: 30175241 pmcid: 6081973 doi: 10.12688/wellcomeopenres.14594.1
Gunning, A. C. et al. Assessing performance of pathogenicity predictors using clinically-relevant variant datasets. bioRxiv 2020.02.06.937169. https://doi.org/10.1101/2020.02.06.937169 (2020).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using polyPhen-2. Curr. Protoc. Hum. Genet. Ed. Board Jonathan Haines Al 07, Unit 7.20 (2013).
Limongelli, I., Marini, S. & Bellazzi, R. PaPI: Pseudo amino acid composition to score human protein-coding variants. BMC Bioinform. 16, 123 (2015).
doi: 10.1186/s12859-015-0554-8
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
pubmed: 30371827 doi: 10.1093/nar/gky1016
Hu, Z. et al. VIPdb, a genetic variant impact predictor database. Hum. Mutat. 40, 1202–1214 (2019).
pubmed: 31283070 pmcid: 7288905 doi: 10.1002/humu.23858
Niroula, A. & Vihinen, M. How good are pathogenicity predictors in detecting benign variants?. PLOS Comput. Biol. 15, e1006481 (2019).
pubmed: 30742610 pmcid: 6386394 doi: 10.1371/journal.pcbi.1006481
Ernst, C. et al. Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Med. Genomics 11, 35 (2018).
pubmed: 29580235 pmcid: 5870501 doi: 10.1186/s12920-018-0353-y
Niehaus, A. et al. A survey assessing adoption of the ACMG-AMP guidelines for interpreting sequence variants and identification of areas for continued improvement. Genet. Med. Off. J. Am. Coll. Med. Genet. 21, 1699–1701 (2019).
Luo, X. et al. ClinGen myeloid malignancy variant curation expert panel recommendations for germline RUNX1 variants. Blood Adv. 3, 2962–2979 (2019).
pubmed: 31648317 pmcid: 6849945 doi: 10.1182/bloodadvances.2019000644
Mester, J. L. et al. Gene-specific criteria for PTEN variant curation: Recommendations from the ClinGen PTEN Expert Panel. Hum. Mutat. 39, 1581–1592 (2018).
pubmed: 30311380 pmcid: 6329583 doi: 10.1002/humu.23636
Kelly, M. A. et al. Adaptation and validation of the ACMG/AMP variant classification framework for MYH7 -associated inherited cardiomyopathies: recommendations by ClinGen’s inherited cardiomyopathy expert panel. Genet. Med. 20, 351–359 (2018).
pubmed: 29300372 pmcid: 5876064 doi: 10.1038/gim.2017.218
Li, Q. & Wang, K. InterVar: Clinical interpretation of genetic variants by the 2015 ACMG-AMP Guidelines. Am. J. Hum. Genet. 100, 267–280 (2017).
pubmed: 28132688 pmcid: 5294755 doi: 10.1016/j.ajhg.2017.01.004
Ravichandran, V. et al. Toward automation of germline variant curation in clinical cancer genetics. Genet. Med. 21, 2116–2125 (2019).
pubmed: 30787465 pmcid: 6703969 doi: 10.1038/s41436-019-0463-8
Xavier, A., Scott, R. J. & Talseth-Palmer, B. A. TAPES: A tool for assessment and prioritisation in exome studies. PLOS Comput. Biol. 15, e1007453 (2019).
pubmed: 31613886 pmcid: 6814239 doi: 10.1371/journal.pcbi.1007453
Dahary, D. et al. Genome analysis and knowledge-driven variant interpretation with TGex. BMC Med. Genomics 12, 1–17 (2019).
doi: 10.1186/s12920-019-0647-8
Whiffin, N. et al. CardioClassifier: Disease- and gene-specific computational decision support for clinical genome interpretation. Genet. Med. 20, 1246–1254 (2018).
pubmed: 29369293 pmcid: 6558251 doi: 10.1038/gim.2017.258
Nicora, G. et al. CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum. Mutat. 39, 1835–1846 (2018).
pubmed: 30298955 doi: 10.1002/humu.23665
Scott, A. D. et al. CharGer: Clinical characterization of germline variants. Bioinform. Oxf. Engl. 35, 865–867 (2019).
doi: 10.1093/bioinformatics/bty649
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. Off. J. Am. Coll. Med. Genet. 20, 1054–1060 (2018).
Bennett, J. S. et al. Reclassification of variants of uncertain significance in children with inherited arrhythmia syndromes is predicted by clinical factors. Pediatr. Cardiol. 40, 1679–1687 (2019).
pubmed: 31535183 doi: 10.1007/s00246-019-02203-2
Ana, M. & Hershberger Ray, E. Variants of uncertain significance. Circ. Genomic Precis. Med. 11, e002169 (2018).
Landrum, M. J. et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
pubmed: 24234437 doi: 10.1093/nar/gkt1113
Sarkar, A., Yang, Y. & Vihinen, M. Variation benchmark datasets: Update, criteria, quality and applications. bioRxiv 634766. https://doi.org/10.1101/634766 (2019).
Lai, C. et al. LEAP: Using machine learning to support variant classification in a clinical setting. Hum. Mutat. 41, 1079–1090 (2020).
pubmed: 32176384 pmcid: 7317941 doi: 10.1002/humu.24011
Alirezaie, N., Kernohan, K. D., Hartley, T., Majewski, J. & Hocking, T. D. ClinPred: Prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am. J. Hum. Genet. 103, 474–483 (2018).
pubmed: 30220433 pmcid: 6174354 doi: 10.1016/j.ajhg.2018.08.005
do Nascimento, P. M., Medeiros, I. G., Falcão, R. M., Stransky, B. & de Souza, J. E. S. A decision tree to improve identification of pathogenic mutations in clinical practice. BMC Med. Inform. Decis. Mak. 20, 52 (2020).
pubmed: 32151256 pmcid: 7063785 doi: 10.1186/s12911-020-1060-0
Ritchie, G. R. & Flicek, P. Computational approaches to interpreting genomic sequence variation. Genome Med. 6, 87 (2014).
pubmed: 25473426 pmcid: 4254438 doi: 10.1186/s13073-014-0087-1
Li, Q., Zhao, K., Bustamante, C. D., Ma, X. & Wong, W. H. Xrare: A machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet. Med. 21, 2126–2134 (2019).
pubmed: 30675030 pmcid: 6752318 doi: 10.1038/s41436-019-0439-8
Flygare, S. et al. The VAAST variant prioritizer (VVP): Ultrafast, easy to use whole genome variant prioritization tool. BMC Bioinform. 19, 57 (2018).
doi: 10.1186/s12859-018-2056-y
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
pubmed: 24487276 pmcid: 3992975 doi: 10.1038/ng.2892
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, 2001).
doi: 10.1007/978-0-387-21606-5
Dreiseitl, S. & Ohno-Machado, L. Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inform. 35, 352–359 (2002).
pubmed: 12968784 doi: 10.1016/S1532-0464(03)00034-0
Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
Brier, G. W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950).
doi: 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
Zou, Q., Xie, S., Lin, Z., Wu, M. & Ju, Y. Finding the best classification threshold in imbalanced classification. Big Data Res. 5, 2–8 (2016).
doi: 10.1016/j.bdr.2015.12.001
Chinchor, N. MUC-4 evaluation metrics. in Proceedings of the 4th Conference on Message Understanding. 22–29. https://doi.org/10.3115/1072064.1072067 (Association for Computational Linguistics, 1992).
Handschuh, L., Wojciechowski, P., Kazmierczak, M. & Lewandowski, K. Transcript-level dysregulation of BCL2 family genes in acute myeloblastic leukemia. Cancers 13, 3175 (2021).
pubmed: 34202143 pmcid: 8267690 doi: 10.3390/cancers13133175
Agakidou, E. et al. A novel mutation of VPS33B gene associated with incomplete arthrogryposis renal dysfunction-cholestasis phenotype. Case Rep. Genet. 2020, 8872294 (2020).
pubmed: 33029437 pmcid: 7532373
Antonaci, F. et al. Familial hemiplegic migraine type 2 due to a novel missense mutation in ATP1A2. J. Headache Pain 22, 1–6 (2021).
doi: 10.1186/s10194-021-01221-x
Cristina, T.-P. et al. A genetic analysis of a Spanish population with early onset Parkinson’s disease. PLoS ONE 15, e0238098 (2020).
pubmed: 32870915 pmcid: 7462269 doi: 10.1371/journal.pone.0238098
Tesolin, P. et al. Non-syndromic autosomal dominant hearing loss: The first Italian family carrying a mutation in the NCOA3 gene. Genes 12, 1043 (2021).
pubmed: 34356059 pmcid: 8304864 doi: 10.3390/genes12071043
McSherry, F. & Najork, M. Computing information retrieval performance measures efficiently in the presence of tied scores. in Advances in Information Retrieval (eds. Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I. & White, R. W.). 414–421. https://doi.org/10.1007/978-3-540-78646-7_38 (Springer, 2008).
Andreoletti, G., Pal, L. R., Moult, J. & Brenner, S. E. Reports from the fifth edition of CAGI: The critical assessment of genome interpretation. Hum. Mutat. 40, 1197–1201 (2019).
Grimm, D. G. et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36, 513–523 (2015).
pubmed: 25684150 pmcid: 4409520 doi: 10.1002/humu.22768

Auteurs

Giovanna Nicora (G)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
enGenome S.R.L., Pavia, Italy.

Susanna Zucca (S)

enGenome S.R.L., Pavia, Italy.

Ivan Limongelli (I)

enGenome S.R.L., Pavia, Italy.

Riccardo Bellazzi (R)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

Paolo Magni (P)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy. paolo.magni@unipv.it.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH