Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton.
RNA-seq
epigenetic modification
intergenic transcripts
long terminal repeat
nucleosomes
Journal
Science China. Life sciences
ISSN: 1869-1889
Titre abrégé: Sci China Life Sci
Pays: China
ID NLM: 101529880
Informations de publication
Date de publication:
08 2023
08 2023
Historique:
received:
14
12
2022
accepted:
03
04
2023
medline:
17
8
2023
pubmed:
20
4
2023
entrez:
20
04
2023
Statut:
ppublish
Résumé
Genomic analysis has revealed that the 1,637-Mb Gossypium arboreum genome contains approximately 81% transposable elements (TEs), while only 57% of the 735-Mb G. raimondii genome is occupied by TEs. In this study, we investigated whether there were unknown transcripts associated with TE or TE fragments and, if so, how these new transcripts were evolved and regulated. As sequence depths increased from 4 to 100 G, a total of 10,284 novel intergenic transcripts (intergenic genes) were discovered. On average, approximately 84% of these intergenic transcripts possibly overlapped with the long terminal repeat (LTR) insertions in the otherwise untranscribed intergenic regions and were expressed at relatively low levels. Most of these intergenic transcripts possessed no transcription activation markers, while the majority of the regular genic genes possessed at least one such marker. Genes without transcription activation markers formed their+1 and -1 nucleosomes more closely (only (117±1.4)bp apart), while twice as big spaces (approximately (403.5±46.0) bp apart) were detected for genes with the activation markers. The analysis of 183 previously assembled genomes across three different kingdoms demonstrated systematically that intergenic transcript numbers in a given genome correlated positively with its LTR content. Evolutionary analysis revealed that genic genes originated during one of the whole-genome duplication events around 137.7 million years ago (MYA) for all eudicot genomes or 13.7 MYA for the Gossypium family, respectively, while the intergenic transcripts evolved around 1.6 MYA, resultant of the last LTR insertion. The characterization of these low-transcribed intergenic transcripts can facilitate our understanding of the potential biological roles played by LTRs during speciation and diversifications.
Identifiants
pubmed: 37079218
doi: 10.1007/s11427-022-2341-8
pii: 10.1007/s11427-022-2341-8
doi:
Substances chimiques
DNA Transposable Elements
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1711-1724Informations de copyright
© 2023. Science China Press.
Références
Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al. (2000). The genome sequence of Drosophila melanogaster. Science 287, 2185–2195.
pubmed: 10731132
doi: 10.1126/science.287.5461.2185
Argout, X., Salse, J., Aury, J.M., Guiltinan, M.J., Droc, G., Gouzy, J., Allegre, M., Chaparro, C., Legavre, T., Maximova, S.N., et al. (2011). The genome of Theobroma cacao. Nat Genet 43, 101–108.
pubmed: 21186351
doi: 10.1038/ng.736
Blattner, F.R., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462.
pubmed: 9278503
doi: 10.1126/science.277.5331.1453
Carullo, N.V.N., Phillips III, R.A., Simon, R.C., Soto, S.A.R., Hinds, J.E., Salisbury, A.J., Revanna, J.S., Bunner, K.D., Ianov, L., Sultan, F.A., et al. (2020). Enhancer RNAs predict enhancer-gene regulatory links and are critical for enhancer function in neuronal systems. Nucleic Acids Res 48, 9550–9570.
pubmed: 32810208
pmcid: 7515708
doi: 10.1093/nar/gkaa671
Chen, J., Zeng, B., Zhang, M., Xie, S., Wang, G., Hauck, A., and Lai, J. (2014). Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol 166, 252–264.
pubmed: 25037214
pmcid: 4149711
doi: 10.1104/pp.114.240689
Chen, N. (2004). Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 5.
Chen, S., Zhang, Y.E., and Long, M. (2010). New genes in Drosophila quickly become essential. Science 330, 1682–1685.
pubmed: 21164016
pmcid: 7211344
doi: 10.1126/science.1196380
Chereji, R.V., Bryson, T.D., and Henikoff, S. (2019). Quantitative MNase-seq accurately maps nucleosome occupancy levels. Genome Biol 20, 198.
pubmed: 31519205
pmcid: 6743174
doi: 10.1186/s13059-019-1815-z
Consortium, C.E.S. (1998). Genome sequence of the nematode C. elegans, a platform for investigating biology. Science 282, 2012–2018.
doi: 10.1126/science.282.5396.2012
Cowley, M., and Oakey, R.J. (2013). Transposable elements re-wire and fine-tune the transcriptome. PloS Genet 9, e1003234.
pubmed: 23358118
pmcid: 3554611
doi: 10.1371/journal.pgen.1003234
Du, X., Huang, G., He, S., Yang, Z., Sun, G., Ma, X., Li, N., Zhang, X., Sun, J., Liu, M., et al. (2018). Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits. Nat Genet 50, 796–802.
pubmed: 29736014
doi: 10.1038/s41588-018-0116-x
Duttke, S.H., Chang, M.W., Heinz, S., and Benner, C. (2019). Identification and dynamic quantification of regulatory elements using total RNA. Genome Res 29, 1836–1846.
pubmed: 31649059
pmcid: 6836739
doi: 10.1101/gr.253492.119
Gao, D., Li, Y., Kim, K.D., Abernathy, B., and Jackson, S.A. (2016). Landscape and evolutionary dynamics of terminal repeat retrotransposons in miniature in plant genomes. Genome Biol 17, 7.
pubmed: 26781660
pmcid: 4717578
doi: 10.1186/s13059-015-0867-y
Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. (1996). Life with 6000 genes. Science 274, 546–567.
pubmed: 8849441
doi: 10.1126/science.274.5287.546
Ham, D.J., Borsch, A., Lin, S., Thürkauf, M., Weihrauch, M., Reinhard, J. R., Delezie, J., Battilana, F., Wang, X., Kaiser, M.S., et al. (2020). The neuromuscular junction is a focal point of mTORC1 signaling in sarcopenia. Nat Commun 11, 4510.
pubmed: 32908143
pmcid: 7481251
doi: 10.1038/s41467-020-18140-1
Haring, M., Offermann, S., Danker, T., Horst, I., Peterhansel, C., and Stam, M. (2007). Chromatin immunoprecipitation: optimization, quantitative analysis and data normalization. Plant Methods 3, 11.
pubmed: 17892552
pmcid: 2077865
doi: 10.1186/1746-4811-3-11
Henikoff, S., and Shilatifard, A. (2011). Histone modification: cause or cog? Trends Genet 27, 389–396.
pubmed: 21764166
doi: 10.1016/j.tig.2011.06.006
Huang, C.R.L., Burns, K.H., and Boeke, J.D. (2012). Active transposition in genomes. Annu Rev Genet 46, 651–675.
pubmed: 23145912
pmcid: 3612533
doi: 10.1146/annurev-genet-110711-155616
Huang, G., Huang, J.Q., Chen, X.Y., and Zhu, Y.X. (2021). Recent advances and future perspectives in cotton research. Annu Rev Plant Biol 72, 437–462.
pubmed: 33428477
doi: 10.1146/annurev-arplant-080720-113241
Huang, G., Wu, Z., Percy, R.G., Bai, M., Li, Y., Frelichowski, J.E., Hu, J., Wang, K., Yu, J.Z., and Zhu, Y. (2020). Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat Genet 52, 516–524.
pubmed: 32284579
pmcid: 7203013
doi: 10.1038/s41588-020-0607-4
Jang, H.S., Shah, N.M., Du, A.Y., Dailey, Z.Z., Pehrsson, E.C., Godoy, P. M., Zhang, D., Li, D., Xing, X., Kim, S., et al. (2019). Transposable elements drive widespread expression of oncogenes in human cancers. Nat Genet 51, 611–617.
pubmed: 30926969
pmcid: 6443099
doi: 10.1038/s41588-019-0373-3
Jiao, Y., Peluso, P., Shi, J., Liang, T., Stitzer, M.C., Wang, B., Campbell, M. S., Stein, J.C., Wei, X., Chin, C.S., et al. (2017). Improved maize reference genome with single-molecule technologies. Nature 546, 524–527.
pubmed: 28605751
pmcid: 7052699
doi: 10.1038/nature22971
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005). Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–467.
pubmed: 16093699
doi: 10.1159/000084979
Kaul, S., Koo, H.L., Jenkins, J., Rizzo, M., Rooney, T., Tallon, L.J., Feldblyum, T., Nierman, W., Benito, M.I., Lin, X.Y., et al. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815.
doi: 10.1038/35048692
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921.
pubmed: 11237011
doi: 10.1038/35057062
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.
pubmed: 19261174
pmcid: 2690996
doi: 10.1186/gb-2009-10-3-r25
Li, F., Fan, G., Wang, K., Sun, F., Yuan, Y., Song, G., Li, Q., Ma, Z., Lu, C., Zou, C., et al. (2014a). Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet 46, 567–572.
pubmed: 24836287
doi: 10.1038/ng.2987
Li, F., Fan, G., Lu, C., Xiao, G., Zou, C., Kohel, R.J., Ma, Z., Shang, H., Ma, X., Wu, J., et al. (2015). Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol 33, 524–530.
pubmed: 25893780
doi: 10.1038/nbt.3208
Li, G., Ruan, X., Auerbach, R.K., Sandhu, K.S., Zheng, M., Wang, P., Poh, H.M., Goh, Y., Lim, J., Zhang, J., et al. (2012). Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98.
pubmed: 22265404
pmcid: 3339270
doi: 10.1016/j.cell.2011.12.014
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Li, P., Ponnala, L., Gandotra, N., Wang, L., Si, Y., Tausta, S.L., Kebrom, T. H., Provart, N., Patel, R., Myers, C.R., et al. (2010). The developmental dynamics of the maize leaf transcriptome. Nat Genet 42, 1060–1067.
pubmed: 21037569
doi: 10.1038/ng.703
Li, Q., Xiao, G., and Zhu, Y.X. (2014b). Single-nucleotide resolution mapping of the Gossypium raimondii transcriptome reveals a new mechanism for alternative splicing of introns. Mol Plant 7, 829–840.
pubmed: 24398628
doi: 10.1093/mp/sst175
Lin, T., Xu, X., Du, H., Fan, X., Chen, Q., Hai, C., Zhou, Z., Su, X., Kou, L., Gao, Q., et al. (2022). Extensive sequence divergence between the reference genomes of Taraxacum kok-saghyz and Taraxacum mongolicum. Sci China Life Sci 65, 515–528.
pubmed: 34939160
doi: 10.1007/s11427-021-2033-2
Lu, T.C., Leu, J.Y., and Lin, W.C. (2017). A comprehensive analysis of transcript-supported de novo genes in saccharomyces sensu stricto yeasts. Mol Biol Evol 34, 2823–2838.
pubmed: 28981695
pmcid: 5850716
doi: 10.1093/molbev/msx210
Miao, B., Fu, S., Lyu, C., Gontarz, P., Wang, T., and Zhang, B. (2020). Tissue-specific usage of transposable element-derived promoters in mouse development. Genome Biol 21, 255.
pubmed: 32988383
pmcid: 7520981
doi: 10.1186/s13059-020-02164-3
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621–628.
pubmed: 18516045
doi: 10.1038/nmeth.1226
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349.
pubmed: 18451266
pmcid: 2951732
doi: 10.1126/science.1158441
Orozco-Arias, S., Isaza, G., and Guyot, R. (2019). Retrotransposons in plant genomes: structure, identification, and classification through bioinformatics and machine learning. Int J Mol Sci 20, 3837.
pubmed: 31390781
pmcid: 6696364
doi: 10.3390/ijms20153837
Peng, Y., Xiong, D., Zhao, L., Ouyang, W., Wang, S., Sun, J., Zhang, Q., Guan, P., Xie, L., Li, W., et al. (2019). Chromatin interaction maps reveal genetic regulation for quantitative traits in maize. Nat Commun 10, 2632.
pubmed: 31201335
pmcid: 6572838
doi: 10.1038/s41467-019-10602-5
Ramachandran, S., Ahmad, K., and Henikoff, S. (2017). Transcription and remodeling produce asymmetrically unwrapped nucleosomal intermediates. Mol Cell 68, 1038–1053.e4.
pubmed: 29225036
pmcid: 6421108
doi: 10.1016/j.molcel.2017.11.015
Sackton, T.B., Lazzaro, B.P., and Clark, A.G. (2017). Rapid expansion of immune-related gene families in the house fly, Musca domestica. Mol Biol Evol 34, 857–872.
pubmed: 28087775
pmcid: 5400391
Senchina, D.S., Alvarez, I., Cronn, R.C., Liu, B., Rong, J.K., Noyes, R.D., Paterson, A.H., Wing, R.A., Wilkins, T.A., and Wendel, J.F. (2003). Rate variation among nuclear genes and the age of polyploidy in Gossypium. Mol Biol Evol 20, 633–643.
pubmed: 12679546
doi: 10.1093/molbev/msg065
Shen, Y., Yue, F., McCleary, D.F., Ye, Z., Edsall, L., Kuan, S., Wagner, U., Dixon, J., Lee, L., Lobanenkov, V.V., et al. (2012). A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120.
pubmed: 22763441
pmcid: 4041622
doi: 10.1038/nature11243
Shi, L., Lin, Y.H., Sierant, M.C., Zhu, F., Cui, S., Guan, Y., Sartor, M.A., Tanabe, O., Lim, K.C., and Engel, J.D. (2014). Developmental transcriptome analysis of human erythropoiesis. Hum Mol Genet 23, 4528–4542.
pubmed: 24781209
pmcid: 4119405
doi: 10.1093/hmg/ddu167
Stelloo, S., Nevedomskaya, E., Kim, Y., Schuurman, K., Valle-Encinas, E., Lobo, J., Krijgsman, O., Peeper, D.S., Chang, S.L., Feng, F.Y.C., et al. (2018). Integrative epigenetic taxonomy ofprimary prostate cancer. Nat Commun 9, 4900.
pubmed: 30464211
pmcid: 6249266
doi: 10.1038/s41467-018-07270-2
Stothard, P. (2000). The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28, 1102–1104.
pubmed: 10868275
doi: 10.2144/00286ir01
Sun, W., Samimi, H., Gamez, M., Zare, H., and Frost, B. (2018). Pathogenic tau-induced piRNA depletion promotes neuronal death through transposable element dysregulation in neurodegenerative tauopathies. Nat Neurosci 21, 1038–1048.
pubmed: 30038280
pmcid: 6095477
doi: 10.1038/s41593-018-0194-1
Testori, A., Caizzi, L., Cutrupi, S., Friard, O., De Bortoli, M., Cora′, D., and Caselle, M. (2012). The role of transposable elements in shaping the combinatorial interaction of transcription factors. BMC Genomics 13, 400.
pubmed: 22897927
pmcid: 3478180
doi: 10.1186/1471-2164-13-400
Tran, N.H., Choi, K.P., and Zhang, L. (2013). Counting motifs in the human interactome. Nat Commun 4, 3241.
doi: 10.1038/ncomms3241
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562–578.
pubmed: 22383036
pmcid: 3334321
doi: 10.1038/nprot.2012.016
Wang, B., Tseng, E., Regulski, M., Clark, T.A., Hon, T., Jiao, Y., Lu, Z., Olson, A., Stein, J.C., and Ware, D. (2016). Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Commun 7, 11708.
pubmed: 27339440
pmcid: 4931018
doi: 10.1038/ncomms11708
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476.
pubmed: 18978772
pmcid: 2593745
doi: 10.1038/nature07509
Wang, Y., Tang, H., DeBarry, J.D., Tan, X., Li, J., Wang, X., Lee, T., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49.
pubmed: 22217600
pmcid: 3326336
doi: 10.1093/nar/gkr1293
Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., and An, P. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.
pubmed: 12466850
doi: 10.1038/nature01262
Wen, X., Chen, Z., Yang, Z., Wang, M., Jin, S., Wang, G., Zhang, L., Wang, L., Li, J., Saeed, S., et al. (2023). A comprehensive overview of cotton genomics, biotechnology and molecular biological studies. Sci China Life Sci doi: https://doi.org/10.1007/s11427-022-2278-0 .
Wen, X., Huang, G., Li, C., and Zhu, Y. (2021). A Malvaceae-specific miRNA targeting the newly duplicated GaZIP1L to regulate Zn
pubmed: 33481167
doi: 10.1007/s11427-020-1868-9
Wen, X., Zhai, Y., Zhang, L., Chen, Y., Zhu, Z., Chen, G., Wang, K., and Zhu, Y. (2022). Molecular studies of cellulose synthase supercomplex from cotton fiber reveal its unique biochemical properties. Sci China Life Sci 65, 1776–1793.
pubmed: 35394636
doi: 10.1007/s11427-022-2083-9
Wu, P., Zhang, H., Lin, W., Hao, Y., Ren, L., Zhang, C., Li, N., Wei, H., Jiang, Y., and He, F. (2014). Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver. J Proteome Res 13, 2409–2419.
pubmed: 24717071
doi: 10.1021/pr4012206
Xu, F., Kuo, T., Rosli, Y., Liu, M.S., Wu, L., Chen, L.F.O., Fletcher, J.C., Sung, Z.R., and Pu, L. (2018). Trithorax group proteins act together with a polycomb group protein to maintain chromatin integrity for epigenetic silencing during seed germination in Arabidopsis. Mol Plant 11, 659–677.
pubmed: 29428247
doi: 10.1016/j.molp.2018.01.010
Zhang, D., Wang, X., Li, S., Wang, C., Gosney, M.J., Mickelbart, M.V., and Ma, J. (2019). A post-domestication mutation, Dt2, triggers systemic modification of divergent and convergent pathways modulating multiple agronomic traits in soybean. Mol Plant 12, 1366–1382.
pubmed: 31152912
doi: 10.1016/j.molp.2019.05.010
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B. E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Modelbased analysis of ChIP-seq (MACS). Genome Biol 9, R137.
pubmed: 18798982
pmcid: 2592715
doi: 10.1186/gb-2008-9-9-r137
Zhang, Z., Xiao, J., Wu, J., Zhang, H., Liu, G., Wang, X., and Dai, L. (2012). ParaAT: A parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun 419, 779–781.
pubmed: 22390928
doi: 10.1016/j.bbrc.2012.02.101
Zhao, Z.W., Roy, R., Gebhardt, J.C.M., Suter, D.M., Chapman, A.R., and Xie, X.S. (2014). Spatial organization of RNA polymerase II inside a mammalian cell nucleus revealed by reflected light-sheet superresolution microscopy. Proc Natl Acad Sci USA 111, 681–686.
pubmed: 24379392
doi: 10.1073/pnas.1318496111
Zhu, Y., Zhou, Z., Huang, T., Zhang, Z., Li, W., Ling, Z., Jiang, T., Yang, J., Yang, S., Xiao, Y., et al. (2022). Mapping and analysis of a spatiotemporal H3K27ac and gene expression spectrum in pigs. Sci China Life Sci 65, 1517–1534.
pubmed: 35122624
doi: 10.1007/s11427-021-2034-5