Predicting the impact of rare variants on RNA splicing in CAGI6.


Journal

Human genetics
ISSN: 1432-1203
Titre abrégé: Hum Genet
Pays: Germany
ID NLM: 7613873

Informations de publication

Date de publication:
03 Jan 2024
Historique:
received: 15 06 2023
accepted: 18 11 2023
medline: 4 1 2024
pubmed: 4 1 2024
entrez: 3 1 2024
Statut: aheadofprint

Résumé

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

Identifiants

pubmed: 38170232
doi: 10.1007/s00439-023-02624-3
pii: 10.1007/s00439-023-02624-3
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIHR
ID : RP-2016-07-011
Organisme : New South Wales Health
ID : Cardiovascular Disease Senior Scientist Grant
Organisme : University of Southampton
ID : Anniversary Fellowship

Informations de copyright

© 2024. The Author(s).

Références

Cheng J et al (2019) MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol 20(1):48
doi: 10.1186/s13059-019-1653-z pubmed: 30823901 pmcid: 6396468
Danis D, Jacobsen JOB, Carmody LC, Gargano MA, McMurry JA, Hegde A, Haendel MA, Valentini G, Smedley D, Robinson PN (2021) Interpretable prioritization of splice variants in diagnostic next-generation sequencing. Am J Hum Genet 108(9):1564–1577
doi: 10.1016/j.ajhg.2021.06.014 pubmed: 34289339 pmcid: 8456162
Ha C, Kim JW, Jang JH (2021) Performance evaluation of SpliceAI for the prediction of splicing of NF1 variants. Genes (basel) 12:1308
doi: 10.3390/genes12091308 pubmed: 34573290
Jagadeesh KA et al (2019) S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing. Nat Genet 51(4):755–763
doi: 10.1038/s41588-019-0348-4 pubmed: 30804562
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al (2019) Predicting splicing from primary sequence with deep learning. Cell 176(3):535–548
doi: 10.1016/j.cell.2018.12.015 pubmed: 30661751
Jian X, Boerwinkle E, Liu X (2014) In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 42(22):13534–13544
doi: 10.1093/nar/gku1206 pubmed: 25416802 pmcid: 4267638
Karczewski KJ et al (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581(7809):434–443
doi: 10.1038/s41586-020-2308-7 pubmed: 32461654 pmcid: 7334197
Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315
doi: 10.1038/ng.2892 pubmed: 24487276 pmcid: 3992975
Krawczak M, Reiss J, Cooper DN (1992) The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum Genet 90:41–54
doi: 10.1007/BF00210743 pubmed: 1427786
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W et al (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46(D1):D1062–D1067
doi: 10.1093/nar/gkx1153 pubmed: 29165669
Leman R, Parfait B, Vidaud D, Girodon E, Pacot L, Le Gac G, Ka C, Ferec C, Fichou Y, Quesnelle C et al (2022) SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing. Hum Mutat 43(12):2308–2323
doi: 10.1002/humu.24491 pubmed: 36273432
López-Bigas N, Audit B, Ouzounis C, Parra G, Guigó R (2005) Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett 579:1900–1903
doi: 10.1016/j.febslet.2005.02.047 pubmed: 15792793
Lord J, Baralle D (2021) Splicing in the diagnosis of rare disease: advances and challenges. Front Genet 12:689892
doi: 10.3389/fgene.2021.689892 pubmed: 34276790 pmcid: 8280750
Lord J, Gallone G, Short PJ, McRae JF, Ironfield H, Wynn EH, Gerety SS, He L, Kerr B, Johnson DS et al (2019) Pathogenicity and selective constraint on variation near splice sites. Genome Res 29:159–170
doi: 10.1101/gr.238444.118 pubmed: 30587507 pmcid: 6360807
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F (2016) The ensembl variant effect predictor. Genome Biol 17(1):122
doi: 10.1186/s13059-016-0974-4 pubmed: 27268795 pmcid: 4893825
R Core Team (2018) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Rentzsch P, Schubach M, Shendure J, Kircher M (2021) CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 13(1):31
doi: 10.1186/s13073-021-00835-9 pubmed: 33618777 pmcid: 7901104
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E et al (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17:405–424
doi: 10.1038/gim.2015.30 pubmed: 25741868 pmcid: 4544753
Riepe TV, Khan M, Roosing S, Cremers FPM, ‘t Hoen PAC (2020) Benchmarking deep learning splice prediction tools using functional splice assays. Authorea 42:799–810. https://doi.org/10.22541/au.160081230.07101269
doi: 10.22541/au.160081230.07101269
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77
doi: 10.1186/1471-2105-12-77 pubmed: 21414208 pmcid: 3068975
Stranneheim H, Lagerstedt-Robinson K, Magnusson M, Kvarnung M, Nilsson D, Lesko N, Engvall M, Anderlid BM, Arnell H, Johansson CB et al (2021) Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Med 13:40
doi: 10.1186/s13073-021-00855-5 pubmed: 33726816 pmcid: 7968334
Strauch Y, Lord J, Niranjan M, Baralle D (2022) CI-SpliceAI-Improving machine learning predictions of disease causing splicing variants using curated alternative splice sites. PLoS ONE 17:e0269159
doi: 10.1371/journal.pone.0269159 pubmed: 35657932 pmcid: 9165884
Turro E, Astle WJ, Megy K, Graf S, Greene D, Shamardina O, Allen HL, Sanchis-Juan A, Frontini M, Thys C et al (2020) Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583:96–102
doi: 10.1038/s41586-020-2434-2 pubmed: 32581362 pmcid: 7610553
Wai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, Seaby EG, Spiers-Fitzgerald K, Lye J, Ellard S et al (2020) Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med 22:1005–1014
doi: 10.1038/s41436-020-0766-9 pubmed: 32123317 pmcid: 7272326
Wickham H (2009) ggplot2 Elegant graphics for data analysis introduction. Use R. Springer, New York. https://doi.org/10.1007/978-0-387-98141-3_1
doi: 10.1007/978-0-387-98141-3_1
Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11(2–3):377–394
doi: 10.1089/1066527041410418 pubmed: 15285897

Auteurs

Jenny Lord (J)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK.

Carolina Jaramillo Oquendo (CJ)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK.

Htoo A Wai (HA)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK.

Andrew G L Douglas (AGL)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK.
Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.

David J Bunyan (DJ)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK.
Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury, UK.

Yaqiong Wang (Y)

Center for Molecular Medicine, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, 201102, China.

Zhiqiang Hu (Z)

University of California, Berkeley, Berkeley, CA, 94720, USA.

Zishuo Zeng (Z)

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA.

Daniel Danis (D)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, 06032, USA.

Panagiotis Katsonis (P)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Amanda Williams (A)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Olivier Lichtarge (O)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Yuchen Chang (Y)

Agnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia.
Faculty of Medicine and Health, University of Sydney, Sydney, Australia.

Richard D Bagnall (RD)

Agnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia.
Faculty of Medicine and Health, University of Sydney, Sydney, Australia.

Stephen M Mount (SM)

Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA.

Brynja Matthiasardottir (B)

Graduate Program in Biological Sciences and Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA.
Inflammatory Disease Section, National Human Genome Research Institute, Bethesda, MD, USA.

Chiaofeng Lin (C)

DNAnexus, Mountain View, CA, 94040, USA.

Thomas van Overeem Hansen (TVO)

Department of Clinical Genetics, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark.
Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Raphael Leman (R)

Laboratoire de Biologie et Génétique du Cancer, Centre François Baclesse, Caen, France.
Inserm U1245, Cancer Brain and Genomics, Normandie Université, UNICAEN, FHU G4 génomique, Rouen, France.

Alexandra Martins (A)

Inserm U1245, Cancer Brain and Genomics, Normandie Université, UNIROUEN, FHU G4 génomique, Rouen, France.

Claude Houdayer (C)

Inserm U1245, Cancer Brain and Genomics, Normandie Université, UNIROUEN, FHU G4 génomique, Rouen, France.
Department of Genetics, Univ Rouen Normandie, INSERM U1245, FHU-G4 Génomique and CHU Rouen, 76000, Rouen, France.

Sophie Krieger (S)

Laboratoire de Biologie et Génétique du Cancer, Centre François Baclesse, Caen, France.
Inserm U1245, Cancer Brain and Genomics, Normandie Université, UNICAEN, FHU G4 génomique, Rouen, France.

Constantina Bakolitsa (C)

University of California, Berkeley, Berkeley, CA, 94720, USA.

Yisu Peng (Y)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA.

Akash Kamandula (A)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA.

Predrag Radivojac (P)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA.

Diana Baralle (D)

Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK. d.baralle@soton.ac.uk.
Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK. d.baralle@soton.ac.uk.

Classifications MeSH