Transcript-targeted analysis reveals isoform alterations and double-hop fusions in breast cancer.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
22 11 2021
22 11 2021
Historique:
received:
16
11
2020
accepted:
02
11
2021
entrez:
23
11
2021
pubmed:
24
11
2021
medline:
25
12
2021
Statut:
epublish
Résumé
Although transcriptome alteration is an essential driver of carcinogenesis, the effects of chromosomal structural alterations on the cancer transcriptome are not yet fully understood. Short-read transcript sequencing has prevented researchers from directly exploring full-length transcripts, forcing them to focus on individual splice sites. Here, we develop a pipeline for Multi-Sample long-read Transcriptome Assembly (MuSTA), which enables construction of a transcriptome from long-read sequence data. Using the constructed transcriptome as a reference, we analyze RNA extracted from 22 clinical breast cancer specimens. We identify a comprehensive set of subtype-specific and differentially used isoforms, which extended our knowledge of isoform regulation to unannotated isoforms including a short form TNS3. We also find that the exon-intron structure of fusion transcripts depends on their genomic context, and we identify double-hop fusion transcripts that are transcribed from complex structural rearrangements. For example, a double-hop fusion results in aberrant expression of an endogenous retroviral gene, ERVFRD-1, which is normally expressed exclusively in placenta and is thought to protect fetus from maternal rejection; expression is elevated in several TCGA samples with ERVFRD-1 fusions. Our analyses provide direct evidence that full-length transcript sequencing of clinical samples can add to our understanding of cancer biology and genomics in general.
Identifiants
pubmed: 34811492
doi: 10.1038/s42003-021-02833-4
pii: 10.1038/s42003-021-02833-4
pmc: PMC8608905
doi:
Substances chimiques
Protein Isoforms
0
TNS3 protein, human
0
Tensins
0
RNA
63231-63-0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1320Subventions
Organisme : MEXT | Japan Society for the Promotion of Science (JSPS)
ID : 16K07143
Organisme : MEXT | Japan Society for the Promotion of Science (JSPS)
ID : 21H02772
Informations de copyright
© 2021. The Author(s).
Références
Kim, J. & Eberwine, J. RNA: state memory and mediator of cellular phenotype. Trends Cell Biol. 20, 311–318 (2010).
pubmed: 20382532
pmcid: 2892202
doi: 10.1016/j.tcb.2010.03.003
Calabrese, C. et al. Genomic basis for rna alterations in cancer. Nature 578, 129–136 (2020).
pubmed: 32025019
pmcid: 7054216
doi: 10.1038/s41586-020-1970-0
Danan-Gotthold, M. et al. Identification of recurrent regulated alternative splicing events across human solid tumors. Nucleic Acids Res. 43, 5130–5144 (2015).
pubmed: 25908786
pmcid: 4446417
doi: 10.1093/nar/gkv210
Climente-González, H., Porta-Pardo, E., Godzik, A. & Eyras, E. The functional impact of alternative splicing in cancer. Cell Rep. 20, 2215–2226 (2017).
pubmed: 28854369
doi: 10.1016/j.celrep.2017.08.012
Biswas, K. et al. Intragenic DNA methylation and BORIS-mediated cancer-specific splicing contribute to the Warburg effect. Proc. Natl Acad. Sci. USA 114, 11440–11445 (2017).
pubmed: 29073069
pmcid: 5664520
doi: 10.1073/pnas.1708447114
Grelet, S. et al. A regulated PNUTS mRNA to lncRNA splice switch mediates EMT and tumour progression. Nat. Cell Biol. 19, 1105–1115 (2017).
pubmed: 28825698
pmcid: 5578890
doi: 10.1038/ncb3595
Salton, M. et al. Inhibition of vemurafenib-resistant melanoma by interference with pre-mRNA splicing. Nat. Commun. 6, 7103 (2015).
pubmed: 25971842
doi: 10.1038/ncomms8103
Chang, K. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
doi: 10.1038/ng.2764
Shiraishi, Y. et al. A comprehensive characterization of cis-acting splicing-associated variants in human cancer. Genome Res. 28, 1111–1125 (2018).
pubmed: 30012835
pmcid: 6071634
doi: 10.1101/gr.231951.117
Farver, C. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224.e6 (2018).
doi: 10.1016/j.ccell.2018.07.001
Soneson, C., Matthes, K. L., Nowicka, M., Law, C. W. & Robinson, M. D. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. Genome Biol. 17, 12 (2016).
Dueck, H. et al. Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation. Genome Biol. 16, 122 (2015).
pubmed: 26056000
pmcid: 4480509
doi: 10.1186/s13059-015-0683-4
Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl Acad. Sci. USA 111, 9869–9874 (2014).
pubmed: 24961374
pmcid: 4103364
doi: 10.1073/pnas.1400447111
Foulkes, W. D., Smith, I. E. & Reis-Filho, J. S. Triple-negative breast cancer. N. Engl. J. Med. 363, 1938–1948 (2010).
pubmed: 21067385
doi: 10.1056/NEJMra1001389
Kawazu, M. et al. Integrative analysis of genomic alterations in triple-negative breast cancer in association with homologous recombination deficiency. PLoS Genet. 13, 1–23 (2017).
doi: 10.1371/journal.pgen.1006853
Polak, P. et al. A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer. Nat. Genet. 49, 1476–1486 (2017).
pubmed: 28825726
pmcid: 7376751
doi: 10.1038/ng.3934
Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genomics Proteom. Bioinformatics 13, 278–289 (2015).
doi: 10.1016/j.gpb.2015.08.002
Gordon, S. P. et al. Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing. PLoS ONE 10, e0132628 (2015).
pubmed: 26177194
pmcid: 4503453
doi: 10.1371/journal.pone.0132628
Abdel-Ghany, S. E. et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706 (2016).
pubmed: 27339290
pmcid: 4931028
doi: 10.1038/ncomms11706
Wang, B. et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun. 7, 1–13 (2016).
Sadler, K. C. et al. High resolution annotation of zebrafish transcriptome using long-read sequencing. Genome Res. 28, 1415–1425 (2018).
pubmed: 30061115
pmcid: 6120630
doi: 10.1101/gr.223586.117
Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33, 736–742 (2015).
pubmed: 25985263
pmcid: 4832928
doi: 10.1038/nbt.3242
Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. 36, 1197–1202 (2018).
doi: 10.1038/nbt.4259
Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19, 1–18 (2018).
doi: 10.1186/s13059-018-1418-0
Jing, Y. et al. Hybrid sequencing-based personal full-length transcriptomic analysis implicates proteostatic stress in metastatic ovarian cancer. Oncogene 38, 3047–3060, https://doi.org/10.1038/s41388-018-0644-y (2019).
Chen, H. et al. Long‐read RNA sequencing identifies alternative splice variants in hepatocellular carcinoma and tumor‐specific isoforms. Hepatology 70, 1011–1025 (2019).
pubmed: 30637779
doi: 10.1002/hep.30500
Tardaguila, M. et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 28, 396–411 (2018).
pmcid: 5848618
doi: 10.1101/gr.222976.117
Tang, A. D. et al. Full-length transcript characterization of sf3b1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).
pubmed: 32188845
pmcid: 7080807
doi: 10.1038/s41467-020-15171-6
Chen, J. et al. Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 (2020).
pubmed: 32139545
pmcid: 7289059
doi: 10.1126/science.aay0262
Garczyk, S. et al. AGR3 in breast cancer: prognostic impact and suitable serum-based biomarker for early cancer detection. PLoS ONE 10, e0122106 (2015).
pubmed: 25875093
pmcid: 4398490
doi: 10.1371/journal.pone.0122106
Wali, V. B. et al. Identification and validation of a novel biologics target in triple negative breast cancer. Sci. Rep. 9, 14934 (2019).
pubmed: 31624295
pmcid: 6797726
doi: 10.1038/s41598-019-51453-w
Pampalakis, G. et al. The klk5 protease suppresses breast cancer by repressing the mevalonate pathway. Oncotarget 5, 2390–2403 (2014).
pubmed: 24158494
doi: 10.18632/oncotarget.1235
Choi, S. K., Kim, H. S., Jin, T. & Moon, W. K. LOXL4 knockdown enhances tumor growth and lung metastasis through collagen-dependent extracellular matrix changes in triple-negative breast cancer. Oncotarget 8, 11977–11989 (2017).
pubmed: 28060764
pmcid: 5355319
doi: 10.18632/oncotarget.14450
Bemmo, A. et al. Exon-level transcriptome profiling in murine breast cancer reveals splicing changes specific to tumors with different metastatic abilities. PLoS ONE 5, e11981 (2010).
pubmed: 20700505
pmcid: 2917353
doi: 10.1371/journal.pone.0011981
Wang, R. et al. PrLZ, a novel prostate-specific and androgen-responsive gene of the tpd52 family, amplified in chromosome 8q21.1 and overexpressed in human prostate cancer. Cancer Res. 64, 1589–1594 (2004).
pubmed: 14996714
doi: 10.1158/0008-5472.CAN-03-3331
Zhang, D. et al. PrLZ protects prostate cancer cells from apoptosis induced by androgen deprivation via the activation of stat3/bcl-2 pathway. Cancer Res. 71, 2193–2202 (2011).
pubmed: 21385902
pmcid: 3680512
doi: 10.1158/0008-5472.CAN-10-1791
Munkley, J. et al. Androgen-dependent alternative mRNA isoform expression in prostate cancer cells. F1000Research 7, 1189 (2018).
pubmed: 30271587
pmcid: 6143958
doi: 10.12688/f1000research.15604.1
Bjørklund, S. S. et al. Widespread alternative exon usage in clinically distinct subtypes of invasive ductal carcinoma. Sci. Rep. 7, 5568 (2017).
pubmed: 28717182
pmcid: 5514065
doi: 10.1038/s41598-017-05537-0
Qian, X. et al. The tensin-3 protein, including its sh2 domain, is phosphorylated by src and contributes to tumorigenesis and metastasis. Cancer Cell 16, 246–258 (2009).
pubmed: 19732724
pmcid: 3293497
doi: 10.1016/j.ccr.2009.07.031
Cao, X. et al. A phosphorylation switch controls the spatiotemporal activation of rho GTPases in directional cell migration. Nat. Commun. 6, 7721 (2015).
Katz, M. et al. A reciprocal tensin-3-cten switch mediates egf-driven mammary cell migration. Nat. Cell Biol. 9, 961–969 (2007).
pubmed: 17643115
doi: 10.1038/ncb1622
Buniello, A. et al. The nhgri-ebi gwas catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
pubmed: 30445434
doi: 10.1093/nar/gky1120
Carithers, L. J. et al. A novel approach to high-quality postmortem tissue procurement: the gtex project. Biopreserv. Biobank. 13, 311–319 (2015).
pubmed: 26484571
pmcid: 4675181
doi: 10.1089/bio.2015.0032
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
pubmed: 29713083
pmcid: 5990442
doi: 10.1038/s41592-018-0001-7
Stephens, Z., Wang, C., Iyer, R. K. & Kocher, J.-P. Detection and visualization of complex structural variants from long reads. BMC Bioinformatics 19, 508 (2018).
pubmed: 30577744
pmcid: 6302372
doi: 10.1186/s12859-018-2539-x
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
pubmed: 32025012
pmcid: 7025897
doi: 10.1038/s41586-019-1913-9
Kim, P. & Zhou, X. FusionGDB: fusion gene annotation database. Nucleic acids Res. 47, D994–D1004 (2019).
pubmed: 30407583
doi: 10.1093/nar/gky1067
Mangeney, M. et al. Placental syncytins: genetic disjunction between the fusogenic and immunosuppressive activity of retroviral envelope proteins. Proc. Natl Acad. Sci. USA 104, 20534–20539 (2007).
pubmed: 18077339
pmcid: 2154466
doi: 10.1073/pnas.0707873105
Togashi, Y. et al. MYB and mybl1 in adenoid cystic carcinoma: diversity in the mode of genomic rearrangement and transcripts. Mod. Pathol. 31, 934–946 (2018).
pubmed: 29410490
doi: 10.1038/s41379-018-0008-8
Aguado, C. et al. Response to crizotinib in a non-small-cell lung cancer patient harboring an eml4-alk fusion with an atypical ltbp1 insertion. OncoTargets Ther. 11, 1117–1120 (2018).
doi: 10.2147/OTT.S148363
Robesova, B. et al. Identification of atypical atrnl1 insertion to eml4-alk fusion gene in nsclc. Lung Cancer 87, 318–320 (2015).
pubmed: 25601488
doi: 10.1016/j.lungcan.2015.01.002
Saglio, G. et al. A 76-kb duplicon maps close to the bcr gene on chromosome 22 and the abl gene on chromosome 9: possible involvement in the genesis of the Philadelphia chromosome translocation. Proc. Natl Acad. Sci. USA 99, 9882–9887 (2002).
pubmed: 12114534
pmcid: 125051
doi: 10.1073/pnas.152171299
Shen, S., Wang, Y., Wang, C., Wu, Y. N. & Xing, Y. SURVIV for survival analysis of mRNA isoform variation. Nat. Commun. 7, 1–11 (2016).
doi: 10.1038/ncomms11548
Silvester, J. et al. Gene isoforms as expression-based biomarkers predictive of drug response in vitro. Nat. Commun. 8, 1126 (2017).
Franco, H. L. et al. Enhancer transcription reveals subtype-specific gene expression programs controlling breast cancer pathogenesis. Genome Res. 28, 159–170 (2018).
pubmed: 29273624
pmcid: 5793780
doi: 10.1101/gr.226019.117
Van den Berge, K., Soneson, C., Robinson, M. D. & Clement, L. StageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 18, 151 (2017).
pubmed: 28784146
pmcid: 5547545
doi: 10.1186/s13059-017-1277-0
Sondka, Z. et al. The cosmic cancer gene census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
pubmed: 30293088
pmcid: 6450507
doi: 10.1038/s41568-018-0060-1
Liu, Y., Sun, J. & Zhao, M. ONGene: a literature-based database for human oncogenes. J. Genet. Genomics 44, 119–121 (2017).
pubmed: 28162959
doi: 10.1016/j.jgg.2016.12.004
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Prec. Oncol. 2017, PO.17.00011 (2017).
Shiraishi, Y. et al. An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41, e89 (2013).
pubmed: 23471004
pmcid: 3627598
doi: 10.1093/nar/gkt126
Dobin, A. et al. STAR: ultrafast universal rna-seq aligner. Bioinformatics 29, 15–21 (2013).
doi: 10.1093/bioinformatics/bts635
pubmed: 23104886
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
pubmed: 28263959
pmcid: 5600148
doi: 10.1038/nmeth.4197
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
pubmed: 21816040
pmcid: 3163565
doi: 10.1186/1471-2105-12-323
Salmela, L. & Rivals, E. LoRDEC: accurate and efficient long read error correction. Bioinformatics 30, 3506–3514 (2014).
pubmed: 25165095
pmcid: 4253826
doi: 10.1093/bioinformatics/btu538
Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).
pubmed: 29229983
doi: 10.1038/s41588-017-0004-9
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
pubmed: 20003500
pmcid: 2803857
doi: 10.1186/1471-2105-10-421
Ono, Y., Asai, K. & Hamada, M. PBSIM: PacBio reads simulatortoward accurate genome assembly. Bioinformatics 29, 119–121 (2012).
pubmed: 23129296
doi: 10.1093/bioinformatics/bts649
Stöcker, B. K., Köster, J. & Rahmann, S. SimLoRD: simulation of long read data. Bioinformatics 32, 2704–2706 (2016).
pubmed: 27166244
doi: 10.1093/bioinformatics/btw286
Shcherbina, A. FASTQSim: platform-independent data characterization and in silico read generation for ngs datasets. BMC Res. Notes 7, 533 (2014).
pubmed: 25123167
pmcid: 4246604
doi: 10.1186/1756-0500-7-533
Byrne, A., Cole, C., Volden, R. & Vollmers, C. Realizing the potential of full-length transcriptome sequencing. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 374, 20190097 (2019).
doi: 10.1098/rstb.2019.0097
Frazee, A. C., Jaffe, A. E., Langmead, B. & Leek, J. T. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31, 2778–2784 (2015).
pubmed: 25926345
pmcid: 4635655
doi: 10.1093/bioinformatics/btv272
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with deseq2. Genome Biol. 15, 550 (2014).
pubmed: 25516281
pmcid: 4302049
doi: 10.1186/s13059-014-0550-8
Love, M. I., Soneson, C. & Patro, R. Swimming downstream: statistical analysis of differential transcript usage following salmon quantification. F1000Research 7, 952 (2018).
pubmed: 30356428
pmcid: 6178912
doi: 10.12688/f1000research.15398.1
Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4, 1521 (2015).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 4025 (2012).
doi: 10.1101/gr.133744.111
Nowicka, M. & Robinson, M. D. DRIMSeq: a dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Research 5, 1356 (2016).
Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).
pubmed: 29571299
pmcid: 5866513
doi: 10.1186/s13059-018-1417-1
Saraiva-Agostinho, N. & Barbosa-Morais, N. L. Psichomics: graphical application for alternative splicing quantification and analysis. Nucleic Acids Res. 47, e7 (2019).
pubmed: 30277515
doi: 10.1093/nar/gky888
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
pubmed: 21593126
pmcid: 3125773
doi: 10.1093/nar/gkr367
El-Gebali, S. et al. The pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2018).
pmcid: 6324024
doi: 10.1093/nar/gky995
Oki, S. et al. ChIP-atlas: a data-mining suite powered by full integration of public chip-seq data. EMBO Rep. 19, e46255 (2018).
Liu, J. et al. An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).
pubmed: 29625055
pmcid: 6066282
doi: 10.1016/j.cell.2018.02.052
Namba, S. Transcript-targeted analysis reveals isoform alterations and double-hop fusions in breast cancer. figshare https://doi.org/10.6084/m9.figshare.16681219.v2 (2021).