Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project.

Best practices Genome interpretation Genome sequencing Rare disease Variant prioritization

Journal

medRxiv : the preprint server for health sciences
Titre abrégé: medRxiv
Pays: United States
ID NLM: 101767986

Informations de publication

Date de publication:
04 Aug 2023
Historique:
pubmed: 14 8 2023
medline: 14 8 2023
entrez: 14 8 2023
Statut: epublish

Résumé

A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

Sections du résumé

Background UNASSIGNED
A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting.
Methods UNASSIGNED
Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds.
Results UNASSIGNED
Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in
Conclusions UNASSIGNED
By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

Identifiants

pubmed: 37577678
doi: 10.1101/2023.08.02.23293212
pmc: PMC10418577
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NHGRI NIH HHS
ID : U24 HG007346
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG010464
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG009141
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM124952
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG011755
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008900
Pays : United States
Organisme : NICHD NIH HHS
ID : R01 HD103805
Pays : United States

Déclaration de conflit d'intérêts

Competing interests. Authors S.Z., I.L., E.R., P.M., and R.B., own shares of enGenome srl. Authors F.D.P. and G.N. are employees of enGenome srl. Authors T.J., R.S., S.G.V., N.S., A.R., U.S., N.T., are employees of TCS Ltd. Authors P.J.C., C.K., K.N., and P.S. are employees of Invitae Ltd. H.L.R. receives support from Illumina and Microsoft for rare disease gene discovery and diagnosis. A.O’D-L. is a member of the scientific advisory board for Congenica Inc and the Simons Foundation SPARK for Autism study and co-chairs the clinical advisory board for CAGI. S.E.B receives support at UC Berkeley from a research agreement from TCS. All other authors report no competing interests.

Auteurs

Sarah L Stenton (SL)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.

Melanie O'Leary (M)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Gabrielle Lemire (G)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Grace E VanNoy (GE)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Stephanie DiTroia (S)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Vijay S Ganesh (VS)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Emily Groopman (E)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Emily O'Heir (E)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Brian Mangilog (B)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Ikeoluwa Osei-Owusu (I)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Lynn S Pais (LS)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Jillian Serrano (J)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Moriel Singer-Berk (M)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Ben Weisburd (B)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Michael Wilson (M)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Christina Austin-Tse (C)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.

Marwa Abdelhakim (M)

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Azza Althagafi (A)

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia.

Giulia Babbi (G)

Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.

Riccardo Bellazzi (R)

enGenome Srl, Pavia, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

Samuele Bovo (S)

Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy.

Maria Giulia Carta (MG)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

Rita Casadio (R)

Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.

Pieter-Jan Coenen (PJ)

Invitae, San Francisco, California, USA.

Federica De Paoli (F)

enGenome Srl, Pavia, Italy.

Matteo Floris (M)

Department of Biomedical Sciences, University of Sassari, Sassari, Italy.

Manavalan Gajapathy (M)

Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA.
Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.
Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA.

Robert Hoehndorf (R)

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Julius O B Jacobsen (JOB)

William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK.

Thomas Joseph (T)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Akash Kamandula (A)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.

Panagiotis Katsonis (P)

Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Cyrielle Kint (C)

Invitae, San Francisco, California, USA.

Olivier Lichtarge (O)

Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Structural and Computational Biology & Molecular Biophysics Program, Baylor College of Medicine, Houston, TX, USA.
Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA.

Ivan Limongelli (I)

enGenome Srl, Pavia, Italy.

Yulan Lu (Y)

Center for molecular medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China.

Paolo Magni (P)

enGenome Srl, Pavia, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.

Tarun Karthik Kumar Mamidi (TKK)

Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA.
Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.
Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA.

Pier Luigi Martelli (PL)

Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.

Marta Mulargia (M)

Department of Biomedical Sciences, University of Sassari, Sassari, Italy.

Giovanna Nicora (G)

enGenome Srl, Pavia, Italy.

Keith Nykamp (K)

Invitae, San Francisco, California, USA.

Vikas Pejaver (V)

Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Yisu Peng (Y)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.

Thi Hong Cam Pham (THC)

Anatomy and Surgical Training Department, University of Medicine and Pharmacy, Hue University, Vietnam.

Maurizio S Podda (MS)

Department of Biomedical Sciences, University of Sassari, Sassari, Italy.

Aditya Rao (A)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Ettore Rizzo (E)

enGenome Srl, Pavia, Italy.

Vangala G Saipradeep (VG)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Castrense Savojardo (C)

Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.

Peter Schols (P)

Invitae, San Francisco, California, USA.

Yang Shen (Y)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.
Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, Texas, USA.

Naveen Sivadasan (N)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Damian Smedley (D)

William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK.

Dorian Soru (D)

Independent consultant.

Rajgopal Srinivasan (R)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Yuanfei Sun (Y)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

Uma Sunderam (U)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Wuwei Tan (W)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

Naina Tiwari (N)

TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India.

Xiao Wang (X)

Center for molecular medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China.

Yaqiong Wang (Y)

Center for molecular medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China.

Amanda Williams (A)

Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Elizabeth A Worthey (EA)

Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA.
Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.
Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA.

Rujie Yin (R)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

Yuning You (Y)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

Daniel Zeiberg (D)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.

Susanna Zucca (S)

enGenome Srl, Pavia, Italy.

Constantina Bakolitsa (C)

Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA.

Steven E Brenner (SE)

Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA.

Stephanie M Fullerton (SM)

Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA.

Predrag Radivojac (P)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.

Heidi L Rehm (HL)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.

Anne O'Donnell-Luria (A)

Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.

Classifications MeSH