SpliceTransformer predicts tissue-specific splicing linked to human diseases.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
23 Oct 2024
23 Oct 2024
Historique:
received:
27
11
2023
accepted:
24
09
2024
medline:
24
10
2024
pubmed:
24
10
2024
entrez:
23
10
2024
Statut:
epublish
Résumé
We present SpliceTransformer (SpTransformer), a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence. SpTransformer outperforms all previous methods on splicing prediction. Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations, and occur at different frequencies across tissue types. Importantly, tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation. We validate the enrichment in three brain disease datasets involving over 164,000 individuals. Additionally, we identify single nucleotide variations that cause brain-specific splicing alterations, and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes. Finally, SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy, demonstrating the potential to infer disease-causing tissue-specific splicing events. SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases.
Identifiants
pubmed: 39443442
doi: 10.1038/s41467-024-53088-6
pii: 10.1038/s41467-024-53088-6
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
9129Informations de copyright
© 2024. The Author(s).
Références
Tazi, J., Bakkour, N. & Stamm, S. Alternative splicing and disease. Biochim. Biophys. Acta 1792, 14–26 (2009).
pubmed: 18992329
doi: 10.1016/j.bbadis.2008.09.017
Wang, Z. & Burge, C. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–13 (2008).
pubmed: 18369186
pmcid: 2327353
doi: 10.1261/rna.876308
Pagani, F. & Baralle, F. Genomic variants in exons and introns: identifying the splicing spoilers. Nat. Rev. Genet. 5, 389–96 (2004).
pubmed: 15168696
doi: 10.1038/nrg1327
Ahmed, M. S., Ikram, S., Bibi, N. & Mir, A. Hutchinson–Gilford progeria syndrome: a premature aging disease. Mol. Neurobiol. 55, 4417–4427 (2018).
pubmed: 28660486
Yeo, G. & Burge, C. Maximum entropy modeling of short sequence motifs with applications to rna splicing signals. J. Comput. Biol. 11, 377–94 (2004).
pubmed: 15285897
doi: 10.1089/1066527041410418
Rosenberg, A., Patwardhan, R., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
pubmed: 26496609
doi: 10.1016/j.cell.2015.09.054
Cheng, J. et al. Mmsplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).
pubmed: 30823901
pmcid: 6396468
doi: 10.1186/s13059-019-1653-z
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
pubmed: 30661751
doi: 10.1016/j.cell.2018.12.015
Zeng, T. & Li, Y. Predicting rna splicing from dna sequence using pangolin. Genome Biol. 23, 103 (2022).
pubmed: 35449021
pmcid: 9022248
doi: 10.1186/s13059-022-02664-4
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. Cadd-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 1–12 (2021).
doi: 10.1186/s13073-021-00835-9
Wagner, N. et al. Aberrant splicing prediction across human tissues. Nat. Genet. 55, 1–10 (2023).
doi: 10.1038/s41588-023-01373-3
Chen, K. et al. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief. Bioinforma. 25, bbae163 (2024).
doi: 10.1093/bib/bbae163
Wai, H. et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet. Med. 22, 1005–1014 (2020).
pubmed: 32123317
pmcid: 7272326
doi: 10.1038/s41436-020-0766-9
Richter, F. et al. A deep intronic pkhd1 variant identified by spliceAI in a deceased neonate with autosomal recessive polycystic kidney disease. Am. J. Kidney Dis. 83, 829–833 (2024).
Yépez, V. A. et al. Clinical implementation of rna sequencing for mendelian disease diagnostics. Genome Med. 14, 38 (2022).
pubmed: 35379322
pmcid: 8981716
doi: 10.1186/s13073-022-01019-9
Tao, Y., Zhang, Q., Wang, H., Yang, X. & Mu, H. Alternative splicing and related RNA binding proteins in human health and disease. Signal Transduct. Target. Ther. 9, 26 (2024).
pubmed: 38302461
pmcid: 10835012
doi: 10.1038/s41392-024-01734-2
Porter, R., Jaamour, F. & Iwase, S. Neuron-specific alternative splicing of transcriptional machineries: implications for neurodevelopmental disorders. Mol. Cell. Neurosci. 87, 35–45 (2017).
Gandal, M. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
pubmed: 30545856
pmcid: 6443102
doi: 10.1126/science.aat8127
Parras, A. et al. Autism-like phenotype and risk gene mrna deadenylation by cpeb4 mis-splicing. Nature 560, 441–446 (2018).
pubmed: 30111840
pmcid: 6217926
doi: 10.1038/s41586-018-0423-5
Margasyuk, S. et al. Rna in situ conformation sequencing reveals novel long-range rna structures with impact on splicing. RNA 29, rna.079508.122 (2023).
doi: 10.1261/rna.079508.122
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
pubmed: 25525159
doi: 10.1126/science.1254806
Consortium, T. G. The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).
pubmed: 31243369
pmcid: 6658352
doi: 10.1038/s41586-019-1338-5
Smith, A., Sumazin, P. & Zhang, M. Tissue-specific regulatory elements in mammalian promoters. Mol. Syst. Biol. 3, 73 (2007).
pubmed: 17224917
pmcid: 1800356
doi: 10.1038/msb4100114
Das, D. et al. A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res. 35, 4845–57 (2007).
pubmed: 17626050
pmcid: 1950531
doi: 10.1093/nar/gkm485
Liu, H.-L. et al. The role of rna splicing factor ptbp1 in neuronal development. Biochim. Biophys. Acta Mol. Cell Res. 1870, 119506 (2023).
pubmed: 37263298
doi: 10.1016/j.bbamcr.2023.119506
Golanska, E. et al. Analysis of APBB2 gene polymorphisms in sporadic Alzheimer’s disease. Neurosci. Lett. 447, 164–166 (2008).
pubmed: 18852029
doi: 10.1016/j.neulet.2008.10.003
Grant, C. E. & Bailey, T. L. Xstreme: comprehensive motif analysis of biological sequence datasets. Preprint at https://doi.org/10.1101/2021.09.02.458722 (2021).
Giudice, G., Sánchez-Cabo, F., Torroja, C. & Lara-Pezzi, E. Attract—a database of rna-binding proteins and associated motifs. Database 2016, baw035 (2016).
pubmed: 27055826
pmcid: 4823821
doi: 10.1093/database/baw035
Landrum, M. J. et al. Clinvar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–5 (2014).
pubmed: 24234437
doi: 10.1093/nar/gkt1113
Varley, J. M. et al. Characterization of germline tp53 splicing mutations and their genetic and functional analysis. Oncogene 20, 2647–2654 (2001).
pubmed: 11420676
doi: 10.1038/sj.onc.1204369
Spena, S. et al. Congenital afibrinogenemia: first identification of splicing mutations in the fibrinogen bbeta-chain gene causing activation of cryptic splice sites. Blood 100, 4478–84 (2002).
pubmed: 12393540
doi: 10.1182/blood-2002-06-1647
Trinick, J., Knight, P. & Whiting, A. Purification and properties of native titin. J. Mol. Biol. 180, 331–56 (1984).
pubmed: 6512859
doi: 10.1016/S0022-2836(84)80007-8
Zheng, W. et al. Identification of a novel mutation in the titin gene in a chinese family with limb-girdle muscular dystrophy 2j. Mol. Neurobiol. 53, 5097–102 (2016).
pubmed: 26392295
doi: 10.1007/s12035-015-9439-0
Khan, A. et al. Homozygous missense variant in the ttn gene causing autosomal recessive limb-girdle muscular dystrophy type 10. BMC Med. Genet. 20, 166 (2019).
pubmed: 31664938
pmcid: 6819411
doi: 10.1186/s12881-019-0895-7
Hackman, P. et al. Tibial muscular dystrophy is a titinopathy caused by mutations in ttn, the gene encoding the giant skeletal-muscle protein titin. Am. J. Hum. Genet. 71, 492–500 (2002).
pubmed: 12145747
pmcid: 379188
doi: 10.1086/342380
Hackman, P. et al. Truncating mutations in C-terminal titin may cause more severe tibial muscular dystrophy (tmd). Neuromuscul. Disord. 18, 922–8 (2008).
pubmed: 18948003
doi: 10.1016/j.nmd.2008.07.010
Pfeffer, G. et al. Titin founder mutation is a common cause of myofibrillar myopathy with early respiratory failure. J. Neurol. Neurosurg. Psychiatry 85, 331–8 (2014).
pubmed: 23486992
doi: 10.1136/jnnp-2012-304728
Carmignac, V. et al. C-terminal titin deletions cause a novel early-onset myopathy with fatal cardiomyopathy. Ann. Neurol. 61, 340–51 (2007).
pubmed: 17444505
doi: 10.1002/ana.21089
Wang, L. L. et al. Genetic profile and clinical characteristics of Brugada syndrome in the Chinese population. J. Cardiovasc Dev. Dis. 9, 369 (2022).
pubmed: 36354768
pmcid: 9699371
Bresolin, N. et al. Cognitive impairment in duchenne muscular dystrophy. Neuromuscul. Disord. 4, 359–369 (1994).
pubmed: 7981593
doi: 10.1016/0960-8966(94)90072-8
Wilson, K. et al. Duchenne and becker muscular dystrophies: a review of animal models, clinical end points, and biomarker quantification. Toxicol. Pathol. 45, 961–976 (2017).
Doisy, M. et al. Networking to optimize dmd exon 53 skipping in the brain of mdx52 mouse model. Biomedicines 11, 3243 (2023).
pubmed: 38137463
pmcid: 10741439
doi: 10.3390/biomedicines11123243
Trovó-Marqui, A. & Tajara, E. Neurofibromin: a general outlook. Clin. Genet. 70, 1–13 (2006).
pubmed: 16813595
doi: 10.1111/j.1399-0004.2006.00639.x
Gutmann, D., Cole, J. & Collins, F. Modulation of neurofibromatosis type 1 (nf1) gene expression during in vitro myoblast differentiation. J. Neurosci. Res. 37, 398–405 (1994).
pubmed: 8176761
doi: 10.1002/jnr.490370312
Staser, K., Yang, F.-C. & Clapp, D. Mast cells and the neurofibroma microenvironment. Blood 116, 157–64 (2010).
pubmed: 20233971
pmcid: 2910605
doi: 10.1182/blood-2009-09-242875
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e23 (2020).
pubmed: 31981491
pmcid: 7250485
doi: 10.1016/j.cell.2019.12.036
Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509–516 (2022).
Palmer, D. S. et al. Exome sequencing in bipolar disorder identifies akap11 as a risk gene shared with schizophrenia. Nat. Genet. 54, 541–547 (2022).
pubmed: 35410376
pmcid: 9117467
doi: 10.1038/s41588-022-01034-x
Konno, T. et al. Dctn1-related neurodegeneration: Perry syndrome and beyond. Parkinsonism Relat. Disord. 41, 14–24 (2017).
pubmed: 28625595
pmcid: 5546300
doi: 10.1016/j.parkreldis.2017.06.004
Durand, C. M. et al. Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nat. Genet. 39, 25–27 (2007).
pubmed: 17173049
doi: 10.1038/ng1933
Zhu, L. et al. Epigenetic dysregulation of SHANK3 in brain tissues from individuals with autism spectrum disorders. Hum. Mol. Genet. 23, 1563–1578 (2014).
pubmed: 24186872
doi: 10.1093/hmg/ddt547
Fu, J. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet. 54, 1–12 (2022).
doi: 10.1038/s41588-022-01104-0
Waldegger, S. & Jentsch, T. Functional and structural analysis of CLC-K chloride channels involved in renal disease. J. Biol. Chem. 275, 24527–33 (2000).
pubmed: 10831588
doi: 10.1074/jbc.M001987200
Matsumura, Y. et al. Overt nephrogenic diabetes insipidus in mice lacking the CLC-K1 chloride channel. Nat. Genet. 21, 95–98 (1999).
pubmed: 9916798
doi: 10.1038/5036
Zhang, Q. et al. Exploring genes for immunoglobulin A nephropathy: a summary data-based mendelian randomization and fuma analysis. BMC Med. Genomics 16, 16 (2023).
pubmed: 36709307
pmcid: 9884184
doi: 10.1186/s12920-023-01436-8
Wang, T. et al. Arachidonic acid metabolism and kidney inflammation. Int. J. Mol. Sci. 20, 3683 (2019).
Das, U. Arachidonic acid in health and disease with focus on hypertension and diabetes mellitus. J. Adv. Res. 11, 43–55 (2018).
pubmed: 30034875
pmcid: 6052660
doi: 10.1016/j.jare.2018.01.002
Dent, C. I. et al. Quantifying splice-site usage: a simple yet powerful approach to analyze splicing. NAR Genomics Bioinforma. 3, lqab041 (2021).
doi: 10.1093/nargab/lqab041
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2021).
pmcid: 8728283
doi: 10.1093/nar/gkab1049
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (IEEE Xplore, Las Vegas, 2016).
Li, G.-W. et al. Scapture: a deep learning-embedded pipeline that captures polyadenylation information from 3’ tag-based rna-seq of single cells. Genome Biol. 22, 221 (2021).
pubmed: 34376223
pmcid: 8353616
doi: 10.1186/s13059-021-02437-5
Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: a survey. ACM Comput. Surv. 55, 1–28 (2022).
Tay, Y., Bahri, D., Yang, L., Metzler, D. & Juan, D.-C. Sparse sinkhorn attention. In International Conference on Machine Learning, 9438–9447 (PMLR, 2020).
Chennupati, S., Sistu, G., Yogamani, S. & A Rawashdeh, S. Multinet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 0–0 (IEEE Xplore, 2019).
Ling, J. P. et al. Ascot identifies key regulators of neuronal subtype-specific splicing. Nat. Commun. 11, 137 (2020).
pubmed: 31919425
pmcid: 6952364
doi: 10.1038/s41467-019-14020-5
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
pubmed: 22728672
pmcid: 3679285
doi: 10.4161/fly.19695
Köhler, S. et al. The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–d1217 (2021).
pubmed: 33264411
doi: 10.1093/nar/gkaa1043
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
pubmed: 30944313
pmcid: 6447622
doi: 10.1038/s41467-019-09234-6
Dobin, A. et al. Star: ultrafast universal rna-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886
doi: 10.1093/bioinformatics/bts635
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
pubmed: 21221095
pmcid: 3346182
doi: 10.1038/nbt.1754
Garrido-Martín, D., Palumbo, E., Guigó, R. & Breschi, A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 14, e1006360 (2018).
pubmed: 30118475
pmcid: 6114895
doi: 10.1371/journal.pcbi.1006360
Gargano, M. et al. The human phenotype ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52 (2023).
You, N. et al. Splicetransformer predicts tissue-specific splicing linked to human diseases. Splicetransformer v1.0.0. https://doi.org/10.5281/zenodo.13824839 (2024).