Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing.
Journal
Nature structural & molecular biology
ISSN: 1545-9985
Titre abrégé: Nat Struct Mol Biol
Pays: United States
ID NLM: 101186374
Informations de publication
Date de publication:
11 2022
11 2022
Historique:
received:
06
09
2021
accepted:
23
08
2022
pubmed:
28
10
2022
medline:
18
11
2022
entrez:
27
10
2022
Statut:
ppublish
Résumé
It is estimated that 10%-30% of disease-associated genetic variants affect splicing. Splicing variants may generate deleteriously altered gene product and are potential therapeutic targets. However, systematic diagnosis or prediction of splicing variants is yet to be established, especially for the near-exon intronic splice region. The major challenge lies in the redundant and ill-defined branch sites and other splicing motifs therein. Here, we carried out unbiased massively parallel splicing assays on 5,307 disease-associated variants that overlapped with branch sites and collected 5,884 variants across the 5' splice region. We found that strong splice sites and exonic features preserve splicing from intronic sequence variation. Whereas the splice-altering mechanism of the 3' intronic variants is complex, that of the 5' is mainly splice-site destruction. Statistical learning combined with these molecular features allows precise prediction of altered splicing from an intronic variant. This statistical model provides the identity and ranking of biological features that determine splicing, which serves as transferable knowledge and out-performs the benchmarking predictive tool. Moreover, we demonstrated that intronic splicing variants may associate with disease risks in the human population. Our study elucidates the mechanism of splicing response of intronic variants, which classify disease-associated splicing variants for the promise of precision medicine.
Identifiants
pubmed: 36303034
doi: 10.1038/s41594-022-00844-1
pii: 10.1038/s41594-022-00844-1
doi:
Substances chimiques
RNA Splice Sites
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1043-1055Informations de copyright
© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
pubmed: 28488700
pmcid: 6839889
doi: 10.1038/nrm.2017.27
Wilkinson, M. E., Charenton, C. & Nagai, K. RNA splicing by the spliceosome. Annu. Rev. Biochem. 89, 359–388 (2020).
pubmed: 31794245
doi: 10.1146/annurev-biochem-091719-064225
Gooding, C. et al. A class of human exons with predicted distant branch points revealed by analysis of AG dinucleotide exclusion zones.Genome Biol. 7, R1 (2006).
pubmed: 16507133
pmcid: 1431707
doi: 10.1186/gb-2006-7-1-r1
Mercer, T. R. et al. Genome-wide discovery of human splicing branchpoints. Genome Res. 25, 290–303 (2015).
pubmed: 25561518
pmcid: 4315302
doi: 10.1101/gr.182899.114
Taggart, A. J. et al. Large-scale analysis of branchpoint usage across species and cell lines. Genome Res. 27, 639–649 (2017).
pubmed: 28119336
pmcid: 5378181
doi: 10.1101/gr.202820.115
Pineda, J. M. B. & Bradley, R. K. Most human introns are recognized via multiple and tissue-specific branchpoints. Genes Dev. 32, 577–591 (2018).
pubmed: 29666160
pmcid: 5959240
doi: 10.1101/gad.312058.118
Gao, K. P., Masuda, A., Matsuura, T. & Ohno, K. Human branch point consensus sequence is yUnAy. Nucleic Acids Res. 36, 2257–2267 (2008).
pubmed: 18285363
pmcid: 2367711
doi: 10.1093/nar/gkn073
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535 (2019).
pubmed: 30661751
doi: 10.1016/j.cell.2018.12.015
Lim, K. H., Ferraris, L., Filloux, M. E., Raphael, B. J. & Fairbrother, W. G. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc. Natl Acad. Sci. USA 108, 11093–11098 (2011).
pubmed: 21685335
pmcid: 3131313
doi: 10.1073/pnas.1101135108
da Costa, P. J., Menezes, J. & Romao, L. The role of alternative splicing coupled to nonsense-mediated mRNA decay in human disease. Int. J. Biochem. Cell Biol. 91, 168–175 (2017).
pubmed: 28743674
doi: 10.1016/j.biocel.2017.07.013
Group, P. T. C. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).
doi: 10.1038/s41586-020-1970-0
Gupta, A. K. et al. Degenerate minigene library analysis enables identification of altered branch point utilization by mutant splicing factor 3B1 (SF3B1). Nucleic Acids Res. 47, 970–980 (2019).
pubmed: 30462273
doi: 10.1093/nar/gky1161
Cheung, R. et al. A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol. Cell 73, 183 (2019).
pubmed: 30503770
doi: 10.1016/j.molcel.2018.10.037
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
pubmed: 25525159
doi: 10.1126/science.1254806
Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).
pubmed: 30823901
pmcid: 6396468
doi: 10.1186/s13059-019-1653-z
Pertea, M., Lin, X. & Salzberg, S. L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 29, 1185–1190 (2001).
pubmed: 11222768
pmcid: 29713
doi: 10.1093/nar/29.5.1185
Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
pubmed: 26496609
doi: 10.1016/j.cell.2015.09.054
Jian, X., Boerwinkle, E. & Liu, X. In silico tools for splicing defect prediction: a survey from the viewpoint of end users. Genet. Med. 16, 497–503 (2014).
pubmed: 24263461
doi: 10.1038/gim.2013.176
Riepe, T. V., Khan, M., Roosing, S., Cremers, F. P. M. & 't Hoen, P. A. C. Benchmarking deep learning splice prediction tools using functional splice assays. Hum. Mutat. 42, 799–810 (2021).
pubmed: 33942434
pmcid: 8360004
doi: 10.1002/humu.24212
Soemedi, R. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat. Genet. 49, 848–855 (2017).
pubmed: 28416821
pmcid: 6679692
doi: 10.1038/ng.3837
Lin, H. et al. RegSNPs-intron: a computational framework for predicting pathogenic impact of intronic single nucleotide variants.Genome Biol. 20, 254 (2019).
pubmed: 31779641
pmcid: 6883696
doi: 10.1186/s13059-019-1847-4
Jagadeesh, K. A. et al. S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing. Nat. Genet. 51, 755 (2019).
pubmed: 30804562
doi: 10.1038/s41588-019-0348-4
Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
pubmed: 12754702
doi: 10.1002/humu.10212
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
pubmed: 26582918
doi: 10.1093/nar/gkv1222
Sherry, S. T., Ward, M. H. & Sirotkin, K. dbSNP – Database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
pubmed: 10447503
doi: 10.1101/gr.9.8.677
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
pubmed: 27899578
doi: 10.1093/nar/gkw1121
Adamson, S. I., Zhan, L. & Graveley, B. R. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 19, 71 (2018).
pubmed: 29859120
pmcid: 5984807
doi: 10.1186/s13059-018-1437-x
Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 1, 543–556 (2012).
pubmed: 22832277
doi: 10.1016/j.celrep.2012.03.013
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).
Leman, R. et al. Assessment of branch point prediction tools to predict physiological branch points and their alteration by variants. BMC Genomics 21, 86 (2020).
pubmed: 31992191
pmcid: 6988378
doi: 10.1186/s12864-020-6484-5
Lin, J. C., Fan, C. T., Liao, C. C. & Chen, Y. S. Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience 7, 1–4 (2018).
pubmed: 29635374
doi: 10.1093/gigascience/gix110
Song, K. et al. The transcriptional coactivator CAMTA2 stimulates cardiac growth by opposing class II histone deacetylases. Cell 125, 453–466 (2006).
pubmed: 16678093
doi: 10.1016/j.cell.2006.02.048
John, S. W. M. et al. Genetic decreases in atrial-natriuretic-peptide and salt-sensitive hypertension. Science 267, 679–681 (1995).
pubmed: 7839143
doi: 10.1126/science.7839143
Chan, J. C. Y. et al. Hypertension in mice lacking the proatrial natriuretic peptide convertase corin. Proc. Natl Acad. Sci. USA 102, 785–790 (2005).
pubmed: 15637153
pmcid: 545541
doi: 10.1073/pnas.0407234102
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
pubmed: 26854917
pmcid: 4767558
doi: 10.1038/ng.3506
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415 (2016).
pubmed: 27863252
pmcid: 5300907
doi: 10.1016/j.cell.2016.10.042
Massaguer, A. et al. Characterization of platelet and soluble-porcine P-selectin (CD62P).Vet. Immunol. Immunopathol. 96, 169–181 (2003).
pubmed: 14592730
doi: 10.1016/S0165-2427(03)00163-6
Baeza-Centurion, P., Minana, B., Valcarcel, J. & Lehner, B. Mutations primarily alter the inclusion of alternatively spliced exons.eLife 9, e59959 (2020).
pubmed: 33112234
pmcid: 7673789
doi: 10.7554/eLife.59959
Braun, S. et al. Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315 (2018).
pubmed: 30120239
pmcid: 6098099
doi: 10.1038/s41467-018-05748-7
Chiang, H. L., Wu, J. Y. & Chen, Y. T. Identification of functional single nucleotide polymorphisms in the branchpoint site. Hum. Genomics 11, 27 (2017).
pubmed: 29121990
pmcid: 5680774
doi: 10.1186/s40246-017-0122-6
Mikl, M., Hamburg, A., Pilpel, Y. & Segal, E. Dissecting splicing decisions and cell-to-cell variability with designed sequence libraries. Nat. Commun. 10, 4572 (2019).
pubmed: 31594945
pmcid: 6783452
doi: 10.1038/s41467-019-12642-3
Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).
pubmed: 15285897
doi: 10.1089/1066527041410418
Corvelo, A., Hallegger, M., Smith, C. W. J. & Eyras, E. Genome-wide association between branch point properties and alternative splicing.PLoS Comput. Biol. 6, e1001016 (2010).
pubmed: 21124863
pmcid: 2991248
doi: 10.1371/journal.pcbi.1001016
Bonano, V. I., Oltean, S. & Garcia-Blanco, M. A. A protocol for imaging alternative splicing regulation in vivo using fluorescence reporters in transgenic mice. Nat. Protoc. 2, 2166–2181 (2007).
pubmed: 17853873
doi: 10.1038/nprot.2007.292
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Cotto, K. C. et al. RegTools: Integrative analysis of genomic and transcriptomic data to identify splice altering mutations across 35 cancer types.Cancer Res. 80(16 Suppl), 2136 (2020).
doi: 10.1158/1538-7445.AM2020-2136
Lorenz, R. et al. ViennaRNA Package 2.0.Algorithm Mol. Biol. 6, 26 (2011).
doi: 10.1186/1748-7188-6-26
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: A sequence logo generator. Genome Res 14, 1188–1190 (2004).
pubmed: 15173120
pmcid: 419797
doi: 10.1101/gr.849004
Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).
pubmed: 21659425
pmcid: 3149502
doi: 10.1101/gr.119628.110
Culler, S. J., Hoff, K. G., Voelker, R. B., Berglund, J. A. & Smolke, C. D. Functional selection and systematic analysis of intronic splicing elements identify active sequence motifs and associated splicing factors. Nucleic Acids Res. 38, 5152–5165 (2010).
pubmed: 20385591
pmcid: 2926609
doi: 10.1093/nar/gkq248
Wang, Y., Ma, M., Xiao, X. & Wang, Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat. Struct. Mol. Biol. 19, 1044–1052 (2012).
pubmed: 22983564
pmcid: 3753194
doi: 10.1038/nsmb.2377
Giudice, G., Sanchez-Cabo, F., Torroja, C. & Lara-Pezzi, E. ATtRACT – a database of RNA-binding proteins and associated motifs. Database (Oxford) 2016, baw035 (2016).
doi: 10.1093/database/baw035
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
pubmed: 20808728
pmcid: 2929880
doi: 10.18637/jss.v033.i01
Tibshirani, R. et al. Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Series B Stat. Methodol. 74, 245–266 (2012).
pubmed: 25506256
pmcid: 4262615
doi: 10.1111/j.1467-9868.2011.01004.x
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).
pubmed: 33137190
doi: 10.1093/nar/gkaa942
Saito, T. & Rehmsmeier, M. Precrec: fast and accurate precision-recall and ROC curve calculations in R. Bioinformatics 33, 145–147 (2017).
pubmed: 27591081
doi: 10.1093/bioinformatics/btw570
Lin, J. C., Hsiao, W. W. W. & Fan, C. T. Transformation of the Taiwan Biobank 3.0: vertical and horizontal integration. J. Transl. Med 18, 304 (2020).
pubmed: 32762757
pmcid: 7406956
doi: 10.1186/s12967-020-02451-4
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
pubmed: 25722852
pmcid: 4342193
doi: 10.1186/s13742-015-0047-8
Shaun Purcell, C. C. PLINK. v.1.9 edn; www.cog-genomics.org/plink/1.9/ (2019).
Shaun Purcell, C. C. PLINK. v.2.0 edn; www.cog-genomics.org/plink/2.0/ (2019).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
pubmed: 20926424
pmcid: 3025716
doi: 10.1093/bioinformatics/btq559
Ripley, B., Venables, W. & Ripley, M. B. Package ‘nnet’. R. package v.7, 3–12 (2016).