An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases.


Journal

Human genetics
ISSN: 1432-1203
Titre abrégé: Hum Genet
Pays: Germany
ID NLM: 7613873

Informations de publication

Date de publication:
23 Mar 2024
Historique:
received: 28 07 2023
accepted: 27 12 2023
medline: 23 3 2024
pubmed: 23 3 2024
entrez: 23 3 2024
Statut: aheadofprint

Résumé

Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called "Suggested Diagnosis", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the "Suggested Diagnosis" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.

Identifiants

pubmed: 38520562
doi: 10.1007/s00439-023-02638-x
pii: 10.1007/s00439-023-02638-x
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : EIC Accelerator
ID : 190164416
Organisme : EIC Accelerator
ID : 190164416
Organisme : EIC Accelerator
ID : 190164416
Organisme : EIC Accelerator
ID : 190164416
Organisme : EIC Accelerator
ID : 190164416

Informations de copyright

© 2024. The Author(s).

Références

(2021) 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report. N Engl J Med 385:1868–1880
Auton A et al (2015) A global reference for human genetic variation. Nature 526:68–74
doi: 10.1038/nature15393 pubmed: 26432245
Birgmeier J et al (2020) AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 12:9113
doi: 10.1126/scitranslmed.aau9113
Bone WP et al (2016) Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med off J Am Coll Med Genet 18:608–617
Deciphering Developmental Disorders Study (2015) Large-scale discovery of novel genetic causes of developmental disorders. Nature 519:223–228
doi: 10.1038/nature14135
Deng Y, Gao L, Wang B, Guo X (2015) HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS ONE 10:e0115692
doi: 10.1371/journal.pone.0115692 pubmed: 25664462 pmcid: 4321842
Firth HV, Wright CF (2011) The deciphering developmental disorders (DDD) study. Dev Med Child Neurol 53:702–703
doi: 10.1111/j.1469-8749.2011.04032.x pubmed: 21679367
Genome Interpretation Consortium (2022) CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Preprint at https://doi.org/10.48550/arXiv.2205.05897
Grimm DG et al (2015) The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 36:513–523
doi: 10.1002/humu.22768 pubmed: 25684150 pmcid: 4409520
Ioannidis NM et al (2016) REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet 99:877–885
doi: 10.1016/j.ajhg.2016.08.016 pubmed: 27666373 pmcid: 5065685
Jacobsen JOB et al (2022) Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat. https://doi.org/10.1002/humu.24380
doi: 10.1002/humu.24380 pubmed: 35391505 pmcid: 9324157
Jagadeesh KA et al (2019) Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization. Genet Med 21:464–470
doi: 10.1038/s41436-018-0072-y pubmed: 29997393
Katsonis P, Lichtarge O (2014) A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness. Genome Res 24:2050–2058
doi: 10.1101/gr.176214.114 pubmed: 25217195 pmcid: 4248321
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17:1–9
doi: 10.1186/s12916-019-1426-2
Kelly C et al (2022) Phenotype-aware prioritisation of rare Mendelian disease variants. Trends Genet 38:1271–1283
doi: 10.1016/j.tig.2022.07.002 pubmed: 35934592 pmcid: 9950798
Köhler S et al (2021) The human phenotype ontology in 2021. Nucleic Acids Res 49:D1207–D1217
doi: 10.1093/nar/gkaa1043 pubmed: 33264411
Kopanos C et al (2019) VarSome: the human genomic variant search engine. Bioinformatics 35:1978–1980
doi: 10.1093/bioinformatics/bty897 pubmed: 30376034
Li Q, Wang K (2017) InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am J Hum Genet 100:267–280
doi: 10.1016/j.ajhg.2017.01.004 pubmed: 28132688 pmcid: 5294755
Li Q, Zhao K, Bustamante CD, Ma X, Wong WH (2019) Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet Med off J Am Coll Med Genet 21:2126–2134
Licata L et al (2023) Resources and tools for rare disease variant interpretation. Front Mol Biosci. https://doi.org/10.3389/fmolb.2023.1169109
doi: 10.3389/fmolb.2023.1169109 pubmed: 37234922 pmcid: 10206239
Nicora G, Bellazzi R (2020) A reliable machine learning approach applied to single-cell classification in acute myeloid leukemia. AMIA Annu Symp Proc AMIA Symp 2020:925–932
pubmed: 33936468
Nicora G et al (2018) CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum Mutat 39:1835–1846
doi: 10.1002/humu.23665 pubmed: 30298955
Nicora G, Zucca S, Limongelli I, Bellazzi R, Magni P (2022a) A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization. Sci Rep 12:2517
doi: 10.1038/s41598-022-06547-3 pubmed: 35169226 pmcid: 8847497
Nicora G, Rios M, Abu-Hanna A, Bellazzi R (2022b) Evaluating pointwise reliability of machine learning prediction. J Biomed Inform 127:103996
doi: 10.1016/j.jbi.2022.103996 pubmed: 35041981
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143
doi: 10.1007/s10462-010-9165-y
Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Peng J et al (2021) VIP-HL: semi-automated ACMG/AMP variant interpretation platform for genetic hearing loss. Hum Mutat 42:1567–1575
doi: 10.1002/humu.24277 pubmed: 34428318
Pengelly RJ et al (2017) Evaluating phenotype-driven approaches for genetic diagnoses from exomes in a clinical setting. Sci Rep 7:13509
doi: 10.1038/s41598-017-13841-y pubmed: 29044180 pmcid: 5647373
Rao A et al (2020) PRIORI-T: a tool for rare disease gene prioritization using MEDLINE. PLoS ONE 15:e0231728
doi: 10.1371/journal.pone.0231728 pubmed: 32315351 pmcid: 7173875
Ravichandran V et al (2019) Toward automation of germline variant curation in clinical cancer genetics. Genet Med 21:2116–2125
doi: 10.1038/s41436-019-0463-8 pubmed: 30787465 pmcid: 6703969
Rentzsch P, Schubach M, Shendure J, Kircher M (2021) CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 13:31
doi: 10.1186/s13073-021-00835-9 pubmed: 33618777 pmcid: 7901104
Richards S et al (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med off J Am Coll Med Genet 17:405–424
Scott AD et al (2019) CharGer: clinical characterization of germline variants. Bioinformatics 35:865–867
doi: 10.1093/bioinformatics/bty649 pubmed: 30102335
Shaker MH, Hüllermeier E (2020) Aleatoric and Epistemic Uncertainty with Random Forests. In: Advances in Intelligent Data Analysis XVIII (eds. Berthold MR, Feelders A, Krempl G) 444–456 (Springer International Publishing, 2020)
Shen JJ et al (2021) The role of clinical response to treatment in determining pathogenicity of genomic variants. Genet Med 23:581–585
doi: 10.1038/s41436-020-00996-9 pubmed: 33087887
Smedley D et al (2015) Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 10:2004–2015
doi: 10.1038/nprot.2015.124 pubmed: 26562621 pmcid: 5467691
Stenton SL et al (2023) Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project. medRxiv. https://doi.org/10.1101/2023.08.02.23293212
doi: 10.1101/2023.08.02.23293212 pubmed: 38328047 pmcid: 10849673
Tavtigian SV et al (2018) Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med 20:1054–1060
doi: 10.1038/gim.2017.210 pubmed: 29300386 pmcid: 6336098
Tavtigian SV, Harrison SM, Boucher KM, Biesecker LG (2020) Fitting a naturally scaled point system to the ACMG/AMP variant classification guidelines. Hum Mutat 41:1734–1737
doi: 10.1002/humu.24088 pubmed: 32720330 pmcid: 8011844
Tosco-Herrera E, Muñoz-Barrera A, Jáspez D, Rubio-Rodríguez LA, Mendoza-Alvarez A, Rodriguez-Perez H, Jou J Iñigo-Campos A, Corrales A, Ciuffreda L, Martinez-Bugallo F, Prieto-Morin C, García-Olivares V, González-Montelongo R, Lorenzo-Salazar JM, Marcelino-Rodriguez I, Flores C (2022) Evaluation of a whole-exome sequencing pipeline and benchmarking of causal germline variant prioritizers. Hum Mutat 43(12):2010–2020. https://doi.org/10.1002/humu.24459
doi: 10.1002/humu.24459 pubmed: 36054330
Vinkšel M, Writzl K, Maver A, Peterlin B (2021) Improving diagnostics of rare genetic diseases with NGS approaches. J Commun Genet 12:247–256
doi: 10.1007/s12687-020-00500-5
Whiffin N et al (2018) CardioClassifier: disease- and gene-specific computational decision support for clinical genome interpretation. Genet Med 20:1246–1254
doi: 10.1038/gim.2017.258 pubmed: 29369293 pmcid: 6558251
Xavier A, Scott RJ, Talseth-Palmer BA (2019) TAPES: A tool for assessment and prioritisation in exome studies. PLOS Comput Biol 15:e1007453
doi: 10.1371/journal.pcbi.1007453 pubmed: 31613886 pmcid: 6814239
Yang H, Robinson PN, Wang K (2015) Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods 12:841–843
doi: 10.1038/nmeth.3484 pubmed: 26192085 pmcid: 4718403
Yuan X et al (2022) Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases. Brief Bioinform 23:bbac019
doi: 10.1093/bib/bbac019 pubmed: 35134823 pmcid: 8921623
Zhao M et al (2020) Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom Bioinform 2:lqaao32

Auteurs

S Zucca (S)

enGenome Srl, 27100, Pavia, Italy.

G Nicora (G)

enGenome Srl, 27100, Pavia, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

F De Paoli (F)

enGenome Srl, 27100, Pavia, Italy.

M G Carta (MG)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

R Bellazzi (R)

enGenome Srl, 27100, Pavia, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

P Magni (P)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy. paolo.magni@unipv.it.
University of Pavia, 27100, Pavia, Italy. paolo.magni@unipv.it.

E Rizzo (E)

enGenome Srl, 27100, Pavia, Italy.

I Limongelli (I)

enGenome Srl, 27100, Pavia, Italy.

Classifications MeSH