Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing.


Journal

Nature structural & molecular biology
ISSN: 1545-9985
Titre abrégé: Nat Struct Mol Biol
Pays: United States
ID NLM: 101186374

Informations de publication

Date de publication:
11 2022
Historique:
received: 06 09 2021
accepted: 23 08 2022
pubmed: 28 10 2022
medline: 18 11 2022
entrez: 27 10 2022
Statut: ppublish

Résumé

It is estimated that 10%-30% of disease-associated genetic variants affect splicing. Splicing variants may generate deleteriously altered gene product and are potential therapeutic targets. However, systematic diagnosis or prediction of splicing variants is yet to be established, especially for the near-exon intronic splice region. The major challenge lies in the redundant and ill-defined branch sites and other splicing motifs therein. Here, we carried out unbiased massively parallel splicing assays on 5,307 disease-associated variants that overlapped with branch sites and collected 5,884 variants across the 5' splice region. We found that strong splice sites and exonic features preserve splicing from intronic sequence variation. Whereas the splice-altering mechanism of the 3' intronic variants is complex, that of the 5' is mainly splice-site destruction. Statistical learning combined with these molecular features allows precise prediction of altered splicing from an intronic variant. This statistical model provides the identity and ranking of biological features that determine splicing, which serves as transferable knowledge and out-performs the benchmarking predictive tool. Moreover, we demonstrated that intronic splicing variants may associate with disease risks in the human population. Our study elucidates the mechanism of splicing response of intronic variants, which classify disease-associated splicing variants for the promise of precision medicine.

Identifiants

pubmed: 36303034
doi: 10.1038/s41594-022-00844-1
pii: 10.1038/s41594-022-00844-1
doi:

Substances chimiques

RNA Splice Sites 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1043-1055

Informations de copyright

© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
pubmed: 28488700 pmcid: 6839889 doi: 10.1038/nrm.2017.27
Wilkinson, M. E., Charenton, C. & Nagai, K. RNA splicing by the spliceosome. Annu. Rev. Biochem. 89, 359–388 (2020).
pubmed: 31794245 doi: 10.1146/annurev-biochem-091719-064225
Gooding, C. et al. A class of human exons with predicted distant branch points revealed by analysis of AG dinucleotide exclusion zones.Genome Biol. 7, R1 (2006).
pubmed: 16507133 pmcid: 1431707 doi: 10.1186/gb-2006-7-1-r1
Mercer, T. R. et al. Genome-wide discovery of human splicing branchpoints. Genome Res. 25, 290–303 (2015).
pubmed: 25561518 pmcid: 4315302 doi: 10.1101/gr.182899.114
Taggart, A. J. et al. Large-scale analysis of branchpoint usage across species and cell lines. Genome Res. 27, 639–649 (2017).
pubmed: 28119336 pmcid: 5378181 doi: 10.1101/gr.202820.115
Pineda, J. M. B. & Bradley, R. K. Most human introns are recognized via multiple and tissue-specific branchpoints. Genes Dev. 32, 577–591 (2018).
pubmed: 29666160 pmcid: 5959240 doi: 10.1101/gad.312058.118
Gao, K. P., Masuda, A., Matsuura, T. & Ohno, K. Human branch point consensus sequence is yUnAy. Nucleic Acids Res. 36, 2257–2267 (2008).
pubmed: 18285363 pmcid: 2367711 doi: 10.1093/nar/gkn073
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535 (2019).
pubmed: 30661751 doi: 10.1016/j.cell.2018.12.015
Lim, K. H., Ferraris, L., Filloux, M. E., Raphael, B. J. & Fairbrother, W. G. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc. Natl Acad. Sci. USA 108, 11093–11098 (2011).
pubmed: 21685335 pmcid: 3131313 doi: 10.1073/pnas.1101135108
da Costa, P. J., Menezes, J. & Romao, L. The role of alternative splicing coupled to nonsense-mediated mRNA decay in human disease. Int. J. Biochem. Cell Biol. 91, 168–175 (2017).
pubmed: 28743674 doi: 10.1016/j.biocel.2017.07.013
Group, P. T. C. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).
doi: 10.1038/s41586-020-1970-0
Gupta, A. K. et al. Degenerate minigene library analysis enables identification of altered branch point utilization by mutant splicing factor 3B1 (SF3B1). Nucleic Acids Res. 47, 970–980 (2019).
pubmed: 30462273 doi: 10.1093/nar/gky1161
Cheung, R. et al. A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol. Cell 73, 183 (2019).
pubmed: 30503770 doi: 10.1016/j.molcel.2018.10.037
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
pubmed: 25525159 doi: 10.1126/science.1254806
Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).
pubmed: 30823901 pmcid: 6396468 doi: 10.1186/s13059-019-1653-z
Pertea, M., Lin, X. & Salzberg, S. L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 29, 1185–1190 (2001).
pubmed: 11222768 pmcid: 29713 doi: 10.1093/nar/29.5.1185
Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
pubmed: 26496609 doi: 10.1016/j.cell.2015.09.054
Jian, X., Boerwinkle, E. & Liu, X. In silico tools for splicing defect prediction: a survey from the viewpoint of end users. Genet. Med. 16, 497–503 (2014).
pubmed: 24263461 doi: 10.1038/gim.2013.176
Riepe, T. V., Khan, M., Roosing, S., Cremers, F. P. M. & 't Hoen, P. A. C. Benchmarking deep learning splice prediction tools using functional splice assays. Hum. Mutat. 42, 799–810 (2021).
pubmed: 33942434 pmcid: 8360004 doi: 10.1002/humu.24212
Soemedi, R. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat. Genet. 49, 848–855 (2017).
pubmed: 28416821 pmcid: 6679692 doi: 10.1038/ng.3837
Lin, H. et al. RegSNPs-intron: a computational framework for predicting pathogenic impact of intronic single nucleotide variants.Genome Biol. 20, 254 (2019).
pubmed: 31779641 pmcid: 6883696 doi: 10.1186/s13059-019-1847-4
Jagadeesh, K. A. et al. S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing. Nat. Genet. 51, 755 (2019).
pubmed: 30804562 doi: 10.1038/s41588-019-0348-4
Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
pubmed: 12754702 doi: 10.1002/humu.10212
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
pubmed: 26582918 doi: 10.1093/nar/gkv1222
Sherry, S. T., Ward, M. H. & Sirotkin, K. dbSNP – Database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
pubmed: 10447503 doi: 10.1101/gr.9.8.677
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
pubmed: 27899578 doi: 10.1093/nar/gkw1121
Adamson, S. I., Zhan, L. & Graveley, B. R. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 19, 71 (2018).
pubmed: 29859120 pmcid: 5984807 doi: 10.1186/s13059-018-1437-x
Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 1, 543–556 (2012).
pubmed: 22832277 doi: 10.1016/j.celrep.2012.03.013
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).
Leman, R. et al. Assessment of branch point prediction tools to predict physiological branch points and their alteration by variants. BMC Genomics 21, 86 (2020).
pubmed: 31992191 pmcid: 6988378 doi: 10.1186/s12864-020-6484-5
Lin, J. C., Fan, C. T., Liao, C. C. & Chen, Y. S. Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience 7, 1–4 (2018).
pubmed: 29635374 doi: 10.1093/gigascience/gix110
Song, K. et al. The transcriptional coactivator CAMTA2 stimulates cardiac growth by opposing class II histone deacetylases. Cell 125, 453–466 (2006).
pubmed: 16678093 doi: 10.1016/j.cell.2006.02.048
John, S. W. M. et al. Genetic decreases in atrial-natriuretic-peptide and salt-sensitive hypertension. Science 267, 679–681 (1995).
pubmed: 7839143 doi: 10.1126/science.7839143
Chan, J. C. Y. et al. Hypertension in mice lacking the proatrial natriuretic peptide convertase corin. Proc. Natl Acad. Sci. USA 102, 785–790 (2005).
pubmed: 15637153 pmcid: 545541 doi: 10.1073/pnas.0407234102
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
pubmed: 26854917 pmcid: 4767558 doi: 10.1038/ng.3506
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415 (2016).
pubmed: 27863252 pmcid: 5300907 doi: 10.1016/j.cell.2016.10.042
Massaguer, A. et al. Characterization of platelet and soluble-porcine P-selectin (CD62P).Vet. Immunol. Immunopathol. 96, 169–181 (2003).
pubmed: 14592730 doi: 10.1016/S0165-2427(03)00163-6
Baeza-Centurion, P., Minana, B., Valcarcel, J. & Lehner, B. Mutations primarily alter the inclusion of alternatively spliced exons.eLife 9, e59959 (2020).
pubmed: 33112234 pmcid: 7673789 doi: 10.7554/eLife.59959
Braun, S. et al. Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315 (2018).
pubmed: 30120239 pmcid: 6098099 doi: 10.1038/s41467-018-05748-7
Chiang, H. L., Wu, J. Y. & Chen, Y. T. Identification of functional single nucleotide polymorphisms in the branchpoint site. Hum. Genomics 11, 27 (2017).
pubmed: 29121990 pmcid: 5680774 doi: 10.1186/s40246-017-0122-6
Mikl, M., Hamburg, A., Pilpel, Y. & Segal, E. Dissecting splicing decisions and cell-to-cell variability with designed sequence libraries. Nat. Commun. 10, 4572 (2019).
pubmed: 31594945 pmcid: 6783452 doi: 10.1038/s41467-019-12642-3
Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).
pubmed: 15285897 doi: 10.1089/1066527041410418
Corvelo, A., Hallegger, M., Smith, C. W. J. & Eyras, E. Genome-wide association between branch point properties and alternative splicing.PLoS Comput. Biol. 6, e1001016 (2010).
pubmed: 21124863 pmcid: 2991248 doi: 10.1371/journal.pcbi.1001016
Bonano, V. I., Oltean, S. & Garcia-Blanco, M. A. A protocol for imaging alternative splicing regulation in vivo using fluorescence reporters in transgenic mice. Nat. Protoc. 2, 2166–2181 (2007).
pubmed: 17853873 doi: 10.1038/nprot.2007.292
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Cotto, K. C. et al. RegTools: Integrative analysis of genomic and transcriptomic data to identify splice altering mutations across 35 cancer types.Cancer Res. 80(16 Suppl), 2136 (2020).
doi: 10.1158/1538-7445.AM2020-2136
Lorenz, R. et al. ViennaRNA Package 2.0.Algorithm Mol. Biol. 6, 26 (2011).
doi: 10.1186/1748-7188-6-26
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: A sequence logo generator. Genome Res 14, 1188–1190 (2004).
pubmed: 15173120 pmcid: 419797 doi: 10.1101/gr.849004
Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).
pubmed: 21659425 pmcid: 3149502 doi: 10.1101/gr.119628.110
Culler, S. J., Hoff, K. G., Voelker, R. B., Berglund, J. A. & Smolke, C. D. Functional selection and systematic analysis of intronic splicing elements identify active sequence motifs and associated splicing factors. Nucleic Acids Res. 38, 5152–5165 (2010).
pubmed: 20385591 pmcid: 2926609 doi: 10.1093/nar/gkq248
Wang, Y., Ma, M., Xiao, X. & Wang, Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat. Struct. Mol. Biol. 19, 1044–1052 (2012).
pubmed: 22983564 pmcid: 3753194 doi: 10.1038/nsmb.2377
Giudice, G., Sanchez-Cabo, F., Torroja, C. & Lara-Pezzi, E. ATtRACT – a database of RNA-binding proteins and associated motifs. Database (Oxford) 2016, baw035 (2016).
doi: 10.1093/database/baw035
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
pubmed: 20808728 pmcid: 2929880 doi: 10.18637/jss.v033.i01
Tibshirani, R. et al. Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Series B Stat. Methodol. 74, 245–266 (2012).
pubmed: 25506256 pmcid: 4262615 doi: 10.1111/j.1467-9868.2011.01004.x
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).
pubmed: 33137190 doi: 10.1093/nar/gkaa942
Saito, T. & Rehmsmeier, M. Precrec: fast and accurate precision-recall and ROC curve calculations in R. Bioinformatics 33, 145–147 (2017).
pubmed: 27591081 doi: 10.1093/bioinformatics/btw570
Lin, J. C., Hsiao, W. W. W. & Fan, C. T. Transformation of the Taiwan Biobank 3.0: vertical and horizontal integration. J. Transl. Med 18, 304 (2020).
pubmed: 32762757 pmcid: 7406956 doi: 10.1186/s12967-020-02451-4
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
pubmed: 25722852 pmcid: 4342193 doi: 10.1186/s13742-015-0047-8
Shaun Purcell, C. C. PLINK. v.1.9 edn; www.cog-genomics.org/plink/1.9/ (2019).
Shaun Purcell, C. C. PLINK. v.2.0 edn; www.cog-genomics.org/plink/2.0/ (2019).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
pubmed: 20926424 pmcid: 3025716 doi: 10.1093/bioinformatics/btq559
Ripley, B., Venables, W. & Ripley, M. B. Package ‘nnet’. R. package v.7, 3–12 (2016).

Auteurs

Hung-Lun Chiang (HL)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Yi-Ting Chen (YT)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Jia-Ying Su (JY)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.
Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan.
Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.

Hsin-Nan Lin (HN)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Chen-Hsin Albert Yu (CA)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Yu-Jen Hung (YJ)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Yun-Lin Wang (YL)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.

Yen-Tsung Huang (YT)

Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.

Chien-Ling Lin (CL)

Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan. mbcllin@gate.sinica.edu.tw.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH