SpliceTransformer predicts tissue-specific splicing linked to human diseases.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
23 Oct 2024
Historique:
received: 27 11 2023
accepted: 24 09 2024
medline: 24 10 2024
pubmed: 24 10 2024
entrez: 23 10 2024
Statut: epublish

Résumé

We present SpliceTransformer (SpTransformer), a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence. SpTransformer outperforms all previous methods on splicing prediction. Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations, and occur at different frequencies across tissue types. Importantly, tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation. We validate the enrichment in three brain disease datasets involving over 164,000 individuals. Additionally, we identify single nucleotide variations that cause brain-specific splicing alterations, and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes. Finally, SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy, demonstrating the potential to infer disease-causing tissue-specific splicing events. SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases.

Identifiants

pubmed: 39443442
doi: 10.1038/s41467-024-53088-6
pii: 10.1038/s41467-024-53088-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

9129

Informations de copyright

© 2024. The Author(s).

Références

Tazi, J., Bakkour, N. & Stamm, S. Alternative splicing and disease. Biochim. Biophys. Acta 1792, 14–26 (2009).
pubmed: 18992329 doi: 10.1016/j.bbadis.2008.09.017
Wang, Z. & Burge, C. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–13 (2008).
pubmed: 18369186 pmcid: 2327353 doi: 10.1261/rna.876308
Pagani, F. & Baralle, F. Genomic variants in exons and introns: identifying the splicing spoilers. Nat. Rev. Genet. 5, 389–96 (2004).
pubmed: 15168696 doi: 10.1038/nrg1327
Ahmed, M. S., Ikram, S., Bibi, N. & Mir, A. Hutchinson–Gilford progeria syndrome: a premature aging disease. Mol. Neurobiol. 55, 4417–4427 (2018).
pubmed: 28660486
Yeo, G. & Burge, C. Maximum entropy modeling of short sequence motifs with applications to rna splicing signals. J. Comput. Biol. 11, 377–94 (2004).
pubmed: 15285897 doi: 10.1089/1066527041410418
Rosenberg, A., Patwardhan, R., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
pubmed: 26496609 doi: 10.1016/j.cell.2015.09.054
Cheng, J. et al. Mmsplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).
pubmed: 30823901 pmcid: 6396468 doi: 10.1186/s13059-019-1653-z
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
pubmed: 30661751 doi: 10.1016/j.cell.2018.12.015
Zeng, T. & Li, Y. Predicting rna splicing from dna sequence using pangolin. Genome Biol. 23, 103 (2022).
pubmed: 35449021 pmcid: 9022248 doi: 10.1186/s13059-022-02664-4
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. Cadd-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 1–12 (2021).
doi: 10.1186/s13073-021-00835-9
Wagner, N. et al. Aberrant splicing prediction across human tissues. Nat. Genet. 55, 1–10 (2023).
doi: 10.1038/s41588-023-01373-3
Chen, K. et al. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief. Bioinforma. 25, bbae163 (2024).
doi: 10.1093/bib/bbae163
Wai, H. et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet. Med. 22, 1005–1014 (2020).
pubmed: 32123317 pmcid: 7272326 doi: 10.1038/s41436-020-0766-9
Richter, F. et al. A deep intronic pkhd1 variant identified by spliceAI in a deceased neonate with autosomal recessive polycystic kidney disease. Am. J. Kidney Dis. 83, 829–833 (2024).
Yépez, V. A. et al. Clinical implementation of rna sequencing for mendelian disease diagnostics. Genome Med. 14, 38 (2022).
pubmed: 35379322 pmcid: 8981716 doi: 10.1186/s13073-022-01019-9
Tao, Y., Zhang, Q., Wang, H., Yang, X. & Mu, H. Alternative splicing and related RNA binding proteins in human health and disease. Signal Transduct. Target. Ther. 9, 26 (2024).
pubmed: 38302461 pmcid: 10835012 doi: 10.1038/s41392-024-01734-2
Porter, R., Jaamour, F. & Iwase, S. Neuron-specific alternative splicing of transcriptional machineries: implications for neurodevelopmental disorders. Mol. Cell. Neurosci. 87, 35–45 (2017).
Gandal, M. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
pubmed: 30545856 pmcid: 6443102 doi: 10.1126/science.aat8127
Parras, A. et al. Autism-like phenotype and risk gene mrna deadenylation by cpeb4 mis-splicing. Nature 560, 441–446 (2018).
pubmed: 30111840 pmcid: 6217926 doi: 10.1038/s41586-018-0423-5
Margasyuk, S. et al. Rna in situ conformation sequencing reveals novel long-range rna structures with impact on splicing. RNA 29, rna.079508.122 (2023).
doi: 10.1261/rna.079508.122
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
pubmed: 25525159 doi: 10.1126/science.1254806
Consortium, T. G. The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).
pubmed: 31243369 pmcid: 6658352 doi: 10.1038/s41586-019-1338-5
Smith, A., Sumazin, P. & Zhang, M. Tissue-specific regulatory elements in mammalian promoters. Mol. Syst. Biol. 3, 73 (2007).
pubmed: 17224917 pmcid: 1800356 doi: 10.1038/msb4100114
Das, D. et al. A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res. 35, 4845–57 (2007).
pubmed: 17626050 pmcid: 1950531 doi: 10.1093/nar/gkm485
Liu, H.-L. et al. The role of rna splicing factor ptbp1 in neuronal development. Biochim. Biophys. Acta Mol. Cell Res. 1870, 119506 (2023).
pubmed: 37263298 doi: 10.1016/j.bbamcr.2023.119506
Golanska, E. et al. Analysis of APBB2 gene polymorphisms in sporadic Alzheimer’s disease. Neurosci. Lett. 447, 164–166 (2008).
pubmed: 18852029 doi: 10.1016/j.neulet.2008.10.003
Grant, C. E. & Bailey, T. L. Xstreme: comprehensive motif analysis of biological sequence datasets. Preprint at https://doi.org/10.1101/2021.09.02.458722 (2021).
Giudice, G., Sánchez-Cabo, F., Torroja, C. & Lara-Pezzi, E. Attract—a database of rna-binding proteins and associated motifs. Database 2016, baw035 (2016).
pubmed: 27055826 pmcid: 4823821 doi: 10.1093/database/baw035
Landrum, M. J. et al. Clinvar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–5 (2014).
pubmed: 24234437 doi: 10.1093/nar/gkt1113
Varley, J. M. et al. Characterization of germline tp53 splicing mutations and their genetic and functional analysis. Oncogene 20, 2647–2654 (2001).
pubmed: 11420676 doi: 10.1038/sj.onc.1204369
Spena, S. et al. Congenital afibrinogenemia: first identification of splicing mutations in the fibrinogen bbeta-chain gene causing activation of cryptic splice sites. Blood 100, 4478–84 (2002).
pubmed: 12393540 doi: 10.1182/blood-2002-06-1647
Trinick, J., Knight, P. & Whiting, A. Purification and properties of native titin. J. Mol. Biol. 180, 331–56 (1984).
pubmed: 6512859 doi: 10.1016/S0022-2836(84)80007-8
Zheng, W. et al. Identification of a novel mutation in the titin gene in a chinese family with limb-girdle muscular dystrophy 2j. Mol. Neurobiol. 53, 5097–102 (2016).
pubmed: 26392295 doi: 10.1007/s12035-015-9439-0
Khan, A. et al. Homozygous missense variant in the ttn gene causing autosomal recessive limb-girdle muscular dystrophy type 10. BMC Med. Genet. 20, 166 (2019).
pubmed: 31664938 pmcid: 6819411 doi: 10.1186/s12881-019-0895-7
Hackman, P. et al. Tibial muscular dystrophy is a titinopathy caused by mutations in ttn, the gene encoding the giant skeletal-muscle protein titin. Am. J. Hum. Genet. 71, 492–500 (2002).
pubmed: 12145747 pmcid: 379188 doi: 10.1086/342380
Hackman, P. et al. Truncating mutations in C-terminal titin may cause more severe tibial muscular dystrophy (tmd). Neuromuscul. Disord. 18, 922–8 (2008).
pubmed: 18948003 doi: 10.1016/j.nmd.2008.07.010
Pfeffer, G. et al. Titin founder mutation is a common cause of myofibrillar myopathy with early respiratory failure. J. Neurol. Neurosurg. Psychiatry 85, 331–8 (2014).
pubmed: 23486992 doi: 10.1136/jnnp-2012-304728
Carmignac, V. et al. C-terminal titin deletions cause a novel early-onset myopathy with fatal cardiomyopathy. Ann. Neurol. 61, 340–51 (2007).
pubmed: 17444505 doi: 10.1002/ana.21089
Wang, L. L. et al. Genetic profile and clinical characteristics of Brugada syndrome in the Chinese population. J. Cardiovasc Dev. Dis. 9, 369 (2022).
pubmed: 36354768 pmcid: 9699371
Bresolin, N. et al. Cognitive impairment in duchenne muscular dystrophy. Neuromuscul. Disord. 4, 359–369 (1994).
pubmed: 7981593 doi: 10.1016/0960-8966(94)90072-8
Wilson, K. et al. Duchenne and becker muscular dystrophies: a review of animal models, clinical end points, and biomarker quantification. Toxicol. Pathol. 45, 961–976 (2017).
Doisy, M. et al. Networking to optimize dmd exon 53 skipping in the brain of mdx52 mouse model. Biomedicines 11, 3243 (2023).
pubmed: 38137463 pmcid: 10741439 doi: 10.3390/biomedicines11123243
Trovó-Marqui, A. & Tajara, E. Neurofibromin: a general outlook. Clin. Genet. 70, 1–13 (2006).
pubmed: 16813595 doi: 10.1111/j.1399-0004.2006.00639.x
Gutmann, D., Cole, J. & Collins, F. Modulation of neurofibromatosis type 1 (nf1) gene expression during in vitro myoblast differentiation. J. Neurosci. Res. 37, 398–405 (1994).
pubmed: 8176761 doi: 10.1002/jnr.490370312
Staser, K., Yang, F.-C. & Clapp, D. Mast cells and the neurofibroma microenvironment. Blood 116, 157–64 (2010).
pubmed: 20233971 pmcid: 2910605 doi: 10.1182/blood-2009-09-242875
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e23 (2020).
pubmed: 31981491 pmcid: 7250485 doi: 10.1016/j.cell.2019.12.036
Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509–516 (2022).
Palmer, D. S. et al. Exome sequencing in bipolar disorder identifies akap11 as a risk gene shared with schizophrenia. Nat. Genet. 54, 541–547 (2022).
pubmed: 35410376 pmcid: 9117467 doi: 10.1038/s41588-022-01034-x
Konno, T. et al. Dctn1-related neurodegeneration: Perry syndrome and beyond. Parkinsonism Relat. Disord. 41, 14–24 (2017).
pubmed: 28625595 pmcid: 5546300 doi: 10.1016/j.parkreldis.2017.06.004
Durand, C. M. et al. Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nat. Genet. 39, 25–27 (2007).
pubmed: 17173049 doi: 10.1038/ng1933
Zhu, L. et al. Epigenetic dysregulation of SHANK3 in brain tissues from individuals with autism spectrum disorders. Hum. Mol. Genet. 23, 1563–1578 (2014).
pubmed: 24186872 doi: 10.1093/hmg/ddt547
Fu, J. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet. 54, 1–12 (2022).
doi: 10.1038/s41588-022-01104-0
Waldegger, S. & Jentsch, T. Functional and structural analysis of CLC-K chloride channels involved in renal disease. J. Biol. Chem. 275, 24527–33 (2000).
pubmed: 10831588 doi: 10.1074/jbc.M001987200
Matsumura, Y. et al. Overt nephrogenic diabetes insipidus in mice lacking the CLC-K1 chloride channel. Nat. Genet. 21, 95–98 (1999).
pubmed: 9916798 doi: 10.1038/5036
Zhang, Q. et al. Exploring genes for immunoglobulin A nephropathy: a summary data-based mendelian randomization and fuma analysis. BMC Med. Genomics 16, 16 (2023).
pubmed: 36709307 pmcid: 9884184 doi: 10.1186/s12920-023-01436-8
Wang, T. et al. Arachidonic acid metabolism and kidney inflammation. Int. J. Mol. Sci. 20, 3683 (2019).
Das, U. Arachidonic acid in health and disease with focus on hypertension and diabetes mellitus. J. Adv. Res. 11, 43–55 (2018).
pubmed: 30034875 pmcid: 6052660 doi: 10.1016/j.jare.2018.01.002
Dent, C. I. et al. Quantifying splice-site usage: a simple yet powerful approach to analyze splicing. NAR Genomics Bioinforma. 3, lqab041 (2021).
doi: 10.1093/nargab/lqab041
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2021).
pmcid: 8728283 doi: 10.1093/nar/gkab1049
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (IEEE Xplore, Las Vegas, 2016).
Li, G.-W. et al. Scapture: a deep learning-embedded pipeline that captures polyadenylation information from 3’ tag-based rna-seq of single cells. Genome Biol. 22, 221 (2021).
pubmed: 34376223 pmcid: 8353616 doi: 10.1186/s13059-021-02437-5
Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: a survey. ACM Comput. Surv. 55, 1–28 (2022).
Tay, Y., Bahri, D., Yang, L., Metzler, D. & Juan, D.-C. Sparse sinkhorn attention. In International Conference on Machine Learning, 9438–9447 (PMLR, 2020).
Chennupati, S., Sistu, G., Yogamani, S. & A Rawashdeh, S. Multinet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 0–0 (IEEE Xplore, 2019).
Ling, J. P. et al. Ascot identifies key regulators of neuronal subtype-specific splicing. Nat. Commun. 11, 137 (2020).
pubmed: 31919425 pmcid: 6952364 doi: 10.1038/s41467-019-14020-5
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
pubmed: 22728672 pmcid: 3679285 doi: 10.4161/fly.19695
Köhler, S. et al. The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–d1217 (2021).
pubmed: 33264411 doi: 10.1093/nar/gkaa1043
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
pubmed: 30944313 pmcid: 6447622 doi: 10.1038/s41467-019-09234-6
Dobin, A. et al. Star: ultrafast universal rna-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 doi: 10.1093/bioinformatics/bts635
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
pubmed: 21221095 pmcid: 3346182 doi: 10.1038/nbt.1754
Garrido-Martín, D., Palumbo, E., Guigó, R. & Breschi, A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 14, e1006360 (2018).
pubmed: 30118475 pmcid: 6114895 doi: 10.1371/journal.pcbi.1006360
Gargano, M. et al. The human phenotype ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52 (2023).
You, N. et al. Splicetransformer predicts tissue-specific splicing linked to human diseases. Splicetransformer v1.0.0. https://doi.org/10.5281/zenodo.13824839 (2024).

Auteurs

Ningyuan You (N)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.

Chang Liu (C)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.

Yuxin Gu (Y)

Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China.

Rong Wang (R)

Department of Hematology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Hanying Jia (H)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.

Tianyun Zhang (T)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.

Song Jiang (S)

National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China.

Jinsong Shi (J)

National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China.

Ming Chen (M)

Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China.

Min-Xin Guan (MX)

Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China.

Siqi Sun (S)

Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.

Shanshan Pei (S)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.
Bone Marrow Transplantation Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Zhihong Liu (Z)

National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China. liuzhihong@nju.edu.cn.

Ning Shen (N)

Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China. shenningzju@zju.edu.cn.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH