Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.
Best practices
Genome interpretation
Genome sequencing
Rare disease
Variant prioritization
Journal
Human genomics
ISSN: 1479-7364
Titre abrégé: Hum Genomics
Pays: England
ID NLM: 101202210
Informations de publication
Date de publication:
29 Apr 2024
29 Apr 2024
Historique:
received:
11
08
2023
accepted:
02
04
2024
medline:
30
4
2024
pubmed:
30
4
2024
entrez:
29
4
2024
Statut:
epublish
Résumé
A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Sections du résumé
BACKGROUND
BACKGROUND
A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting.
METHODS
METHODS
We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values.
RESULTS
RESULTS
Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency.
CONCLUSIONS
CONCLUSIONS
Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Identifiants
pubmed: 38685113
doi: 10.1186/s40246-024-00604-w
pii: 10.1186/s40246-024-00604-w
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
44Subventions
Organisme : NHGRI NIH HHS
ID : U24 HG007346
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG007346
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1HG008900, U01HG011755, R01HG009141
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1HG008900, U01HG011755, R01HG009141
Pays : United States
Informations de copyright
© 2024. The Author(s).
Références
Splinter K, Adams DR, Bacino CA, Bellen HJ, Bernstein JA, Cheatle-Jarvela AM, et al. Effect of genetic diagnosis on patients with previously undiagnosed disease. N Engl J Med. 2018;379(22):2131–9.
pubmed: 30304647
pmcid: 6481166
doi: 10.1056/NEJMoa1714458
100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 Genomes pilot on rare-disease diagnosis in health care - preliminary report. N Engl J Med. 2021;385(20):1868–80.
doi: 10.1056/NEJMoa2035790
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43.
pubmed: 32461654
pmcid: 7334197
doi: 10.1038/s41586-020-2308-7
Rehm HL. Evolving health care through personal genomics. Nat Rev Genet. 2017;18(4):259–67.
pubmed: 28138143
pmcid: 6517837
doi: 10.1038/nrg.2016.162
Clark MM, Stark Z, Farnaes L, Tan TY, White SM, Dimmock D, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med. 2018;9(3):16.
doi: 10.1038/s41525-018-0053-8
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24.
pubmed: 25741868
pmcid: 4544753
doi: 10.1038/gim.2015.30
Abou Tayoun AN, Pesaran T, DiStefano MT, Oza A, Rehm HL, Biesecker LG, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat. 2018;39(11):1517–24.
pubmed: 30192042
pmcid: 6185798
doi: 10.1002/humu.23626
Pejaver V, Byrne AB, Feng B-J, Pagel KA, Mooney SD, Karchin R, et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria. Am J Hum Genet. 2022;109(12):2163–77.
pubmed: 36413997
pmcid: 9748256
doi: 10.1016/j.ajhg.2022.10.013
Jacobsen JOB, Kelly C, Cipriani V, Research Consortium GE, Mungall CJ, Reese J, et al. Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat. 2022;43(8):1071–81.
pubmed: 35391505
pmcid: 9288531
doi: 10.1002/humu.24380
Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99(4):877–85.
pubmed: 27666373
pmcid: 5065685
doi: 10.1016/j.ajhg.2016.08.016
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535-548.e24.
pubmed: 30661751
doi: 10.1016/j.cell.2018.12.015
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature. 2021;590(7845):290–9.
pubmed: 33568819
pmcid: 7875770
doi: 10.1038/s41586-021-03205-y
Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.
pubmed: 26432245
doi: 10.1038/nature15393
Critical Assessment of Genome Interpretation Consortium. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biology. 2024;25(1):53.
Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, et al. The human phenotype ontology in 2021. Nucleic Acids Res. 2021;49(D1):D1207–17.
pubmed: 33264411
doi: 10.1093/nar/gkaa1043
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.
pubmed: 20644199
pmcid: 2928508
doi: 10.1101/gr.107524.110
Pais LS, Snow H, Weisburd B, Zhang S, Baxter SM, DiTroia S, et al. seqr: a web-based analysis and collaboration tool for rare disease genomics. Hum Mutat. 2022;43(6):698–707.
pubmed: 35266241
pmcid: 9903206
McMurry JA, Köhler S, Washington NL, Balhoff JP, Borromeo C, Brush M, et al. Navigating the phenotype frontier: the monarch initiative. Genetics. 2016;203(4):1491–5.
pubmed: 27516611
pmcid: 4981258
doi: 10.1534/genetics.116.188870
Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. Omim.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015;43(1):D789–98.
pubmed: 25428349
doi: 10.1093/nar/gku1205
Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, et al. DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet. 2009;84(4):524–33.
pubmed: 19344873
pmcid: 2667985
doi: 10.1016/j.ajhg.2009.03.010
GTEx Consortium. The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45(6):580–5.
Bult CJ, Blake JA, Smith CL, Kadin JA, Richardson JE, Mouse Genome Database Group. Mouse genome database (MGD) 2019. Nucleic Acids Res. 2019;47(D1):D801–6.
Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, et al. High-throughput discovery of novel developmental phenotypes. Nature. 2016;537(7621):508–14.
pubmed: 27626380
pmcid: 5295821
doi: 10.1038/nature19356
Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8.
pubmed: 26582918
doi: 10.1093/nar/gkv1222
Stenson PD, Mort M, Ball EV, Chapman M, Evans K, Azevedo L, et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum Genet. 2020;139(10):1197–207.
pubmed: 32596782
pmcid: 7497289
doi: 10.1007/s00439-020-02199-3
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, et al. A large-scale evaluation of computational protein function prediction. Nat Methods. 2013;10(3):221–7.
pubmed: 23353650
pmcid: 3584181
doi: 10.1038/nmeth.2340
Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. SSO Schweiz Monatsschr Zahnheilkd. 1986;1(1):54–75.
Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–51.
pubmed: 32461652
pmcid: 7334194
doi: 10.1038/s41586-020-2287-8
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.
pubmed: 21221095
pmcid: 3346182
doi: 10.1038/nbt.1754
Serrano JG, O’Leary M, VanNoy G, Holm IA, Fraiman YS, Rehm HL, et al. Advancing understanding of inequities in rare disease genomics. MedRxiv. 2023. https://doi.org/10.1101/2023.03.28.23286936 .
doi: 10.1101/2023.03.28.23286936
pubmed: 37646002
pmcid: 10462216
Miller DT, Lee K, Gordon AS, Amendola LM, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23(8):1391–8.
pubmed: 34012069
doi: 10.1038/s41436-021-01171-4
Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, et al. Registered access: authorizing data access. Eur J Hum Genet. 2018;26(12):1721–31.
pubmed: 30069064
pmcid: 6244209
doi: 10.1038/s41431-018-0219-y
Katsonis P, Lichtarge O. A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness. Genome Res. 2014;24(12):2050–8.
pubmed: 25217195
pmcid: 4248321
doi: 10.1101/gr.176214.114
Nicora G, Limongelli I, Gambelli P, Memmi M, Malovini A, Mazzanti A, et al. CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum Mutat. 2018;39(12):1835–46.
pubmed: 30298955
doi: 10.1002/humu.23665
Nicora G, Zucca S, Limongelli I, Bellazzi R, Magni P. A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization. Sci Rep. 2022;12(1):2517.
pubmed: 35169226
pmcid: 8847497
doi: 10.1038/s41598-022-06547-3
Rao A, Joseph T, Saipradeep VG, Kotte S, Sivadasan N, Srinivasan R. PRIORI-T: a tool for rare disease gene prioritization using MEDLINE. PLoS ONE. 2020;15(4): e0231728.
pubmed: 32315351
pmcid: 7173875
doi: 10.1371/journal.pone.0231728
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12.
pubmed: 33237311
doi: 10.1093/nar/gkaa1074
Bone WP, Washington NL, Buske OJ, Adams DR, Davis J, Draper D, et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med. 2016;18(6):608–17.
pubmed: 26562225
doi: 10.1038/gim.2015.137
Smedley D, Schubach M, Jacobsen JOB, Köhler S, Zemojtel T, Spielmann M, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am J Hum Genet. 2016;99(3):595–606.
pubmed: 27569544
pmcid: 5011059
doi: 10.1016/j.ajhg.2016.07.005
Qi H, Zhang H, Zhao Y, Chen C, Long JJ, Chung WK, et al. MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun. 2021;12(1):510.
pubmed: 33479230
pmcid: 7820281
doi: 10.1038/s41467-020-20847-0
Shendure J, Findlay GM, Snyder MW. Genomic medicine-progress, pitfalls, and promise. Cell. 2019;177(1):45–57.
pubmed: 30901547
pmcid: 6531313
doi: 10.1016/j.cell.2019.02.003
Zhang Y, Tachtsidis G, Schob C, Koko M, Hedrich UBS, Lerche H, et al. KCND2 variants associated with global developmental delay differentially impair Kv42 channel gating. Hum Mol Genet. 2021;30(23):2300–14.
pubmed: 34245260
pmcid: 8600029
doi: 10.1093/hmg/ddab192
DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, et al. The Gene Curation Coalition: a global effort to harmonize gene-disease evidence resources. Genet Med. 2022;24(8):1732–42.
pubmed: 35507016
pmcid: 7613247
doi: 10.1016/j.gim.2022.04.017
Pagnamenta AT, Howard MF, Wisniewski E, Popitsch N, Knight SJL, Keays DA, et al. Germline recessive mutations in PI4KA are associated with perisylvian polymicrogyria, cerebellar hypoplasia and arthrogryposis. Hum Mol Genet. 2015;24(13):3732–41.
pubmed: 25855803
pmcid: 4459391
doi: 10.1093/hmg/ddv117
Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5.
pubmed: 24487276
pmcid: 3992975
doi: 10.1038/ng.2892
Guzmán YF, Ramsey K, Stolz JR, Craig DW, Huentelman MJ, Narayanan V, et al. A gain-of-function mutation in the GRIK2 gene causes neurodevelopmental deficits. Neurol Genet. 2017;3(1): e129.
pubmed: 28180184
pmcid: 5286855
doi: 10.1212/NXG.0000000000000129
Stolz JR, Foote KM, Veenstra-Knol HE, Pfundt R, Ten Broeke SW, de Leeuw N, et al. Clustered mutations in the GRIK2 kainate receptor subunit gene underlie diverse neurodevelopmental disorders. Am J Hum Genet. 2021;108(9):1692–709.
pubmed: 34375587
pmcid: 8456161
doi: 10.1016/j.ajhg.2021.07.007
Córdoba M, Rodriguez S, González Morón D, Medina N, Kauffman MA. Expanding the spectrum of Grik2 mutations: intellectual disability, behavioural disorder, epilepsy and dystonia. Clin Genet. 2015;87(3):293–5.
pubmed: 25039795
doi: 10.1111/cge.12423
Motazacker MM, Rost BR, Hucho T, Garshasbi M, Kahrizi K, Ullmann R, et al. A defect in the ionotropic glutamate receptor 6 gene (GRIK2) is associated with autosomal recessive mental retardation. Am J Hum Genet. 2007;81(4):792–8.
pubmed: 17847003
pmcid: 2227928
doi: 10.1086/521275
Seidahmed MZ, Salih MA, Abdulbasit OB, Samadi A, Al Hussien K, Miqdad AM, et al. Hyperekplexia, microcephaly and simplified gyral pattern caused by novel ASNS mutations, case report. BMC Neurol. 2016;15(16):105.
doi: 10.1186/s12883-016-0633-0
Schleinitz D, Seidel A, Stassart R, Klammt J, Hirrlinger PG, Winkler U, et al. Novel mutations in the asparagine synthetase gene (ASNS) associated with microcephaly. Front Genet. 2018;13(9):245.
doi: 10.3389/fgene.2018.00245
Ruzzo EK, Capo-Chichi J-M, Ben-Zeev B, Chitayat D, Mao H, Pappas AL, et al. Deficiency of asparagine synthetase causes congenital microcephaly and a progressive form of encephalopathy. Neuron. 2013;80(2):429–41.
pubmed: 24139043
doi: 10.1016/j.neuron.2013.08.013
Giurgea I, Missirian C, Cacciagli P, Whalen S, Fredriksen T, Gaillon T, et al. TCF4 deletions in Pitt-Hopkins syndrome. Hum Mutat. 2008;29(11):E242–51.
pubmed: 18781613
doi: 10.1002/humu.20859
Kalscheuer VM, Feenstra I, Van Ravenswaaij-Arts CMA, Smeets DFCM, Menzel C, Ullmann R, et al. Disruption of the TCF4 gene in a girl with mental retardation but without the classical Pitt-Hopkins syndrome. Am J Med Genet A. 2008;146A(16):2053–9.
pubmed: 18627065
doi: 10.1002/ajmg.a.32419
Sripathy SR, Wang Y, Moses RL, Fatemi A, Batista DA, Maher BJ. Generation of 10 patient-specific induced pluripotent stem cells (iPSCs) to model Pitt-Hopkins Syndrome. Stem Cell Res. 2020;48: 102001.
pubmed: 32971458
pmcid: 7592084
doi: 10.1016/j.scr.2020.102001
Sprute R, Ardicli D, Oguz KK, Malenica-Mandel A, Daimagüler H-S, Koy A, et al. Clinical outcomes of two patients with a novel pathogenic variant in ASNS: response to asparagine supplementation and review of the literature. Hum Genome Var. 2019;22(6):24.
doi: 10.1038/s41439-019-0055-9
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 2019;20(1):223.
pubmed: 31679514
pmcid: 6827219
doi: 10.1186/s13059-019-1845-6
Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med. 2017. https://doi.org/10.1126/scitranslmed.aal5209 .
doi: 10.1126/scitranslmed.aal5209
pubmed: 28424332
pmcid: 5548421
Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun. 2017;12(8):15824.
doi: 10.1038/ncomms15824
Yépez VA, Gusic M, Kopajtich R, Mertes C, Smith NH, Alston CL, et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med. 2022;14(1):38.
pubmed: 35379322
pmcid: 8981716
doi: 10.1186/s13073-022-01019-9