Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
20 May 2024
20 May 2024
Historique:
received:
11
10
2023
accepted:
08
05
2024
medline:
21
5
2024
pubmed:
21
5
2024
entrez:
20
5
2024
Statut:
epublish
Résumé
Chili pepper (Capsicum) is known for its unique fruit pungency due to the presence of capsaicinoids. The evolutionary history of capsaicinoid biosynthesis and the mechanism of their tissue specificity remain obscure due to the lack of high-quality Capsicum genomes. Here, we report two telomere-to-telomere (T2T) gap-free genomes of C. annuum and its wild nonpungent relative C. rhomboideum to investigate the evolution of fruit pungency in chili peppers. We precisely delineate Capsicum centromeres, which lack high-copy tandem repeats but are extensively invaded by CRM retrotransposons. Through phylogenomic analyses, we estimate the evolutionary timing of capsaicinoid biosynthesis. We reveal disrupted coding and regulatory regions of key biosynthesis genes in nonpungent species. We also find conserved placenta-specific accessible chromatin regions, which likely allow for tissue-specific biosynthetic gene coregulation and capsaicinoid accumulation. These T2T genomic resources will accelerate chili pepper genetic improvement and help to understand Capsicum genome evolution.
Identifiants
pubmed: 38769327
doi: 10.1038/s41467-024-48643-0
pii: 10.1038/s41467-024-48643-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
4295Informations de copyright
© 2024. The Author(s).
Références
Stewart, C. Jr. et al. The Pun1 gene for pungency in pepper encodes a putative acyltransferase. Plant J. 42, 675–688 (2005).
pubmed: 15918882
doi: 10.1111/j.1365-313X.2005.02410.x
Kim, S. et al. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat. Genet. 46, 270–278 (2014).
pubmed: 24441736
doi: 10.1038/ng.2877
Stewart, C. Jr. et al. Genetic control of pungency in C. chinense via the Pun1 locus. J. Exp. Bot. 58, 979–991 (2007).
pubmed: 17339653
doi: 10.1093/jxb/erl243
Liao, Y. et al. The 3D architecture of the pepper genome and its relationship to function and evolution. Nat. Commun. 13, 3479 (2022).
pubmed: 35710823
pmcid: 9203530
doi: 10.1038/s41467-022-31112-x
Shirasawa, K., Hosokawa, M., Yasui, Y., Toyoda, A. & Isobe, S. Chromosome-scale genome assembly of a Japanese chili pepper landrace, Capsicum annuum ‘Takanotsume’. DNA Res. 30, dsac052 (2023).
pubmed: 36566389
doi: 10.1093/dnares/dsac052
Lee, J. H. et al. High-quality chromosome-scale genomes facilitate effective identification of large structural variations in hot and sweet peppers. Hortic. Res. 9, uhac210 (2022).
pubmed: 36467270
pmcid: 9715575
doi: 10.1093/hr/uhac210
Qin, C. et al. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. USA 111, 5135–5140 (2014).
pubmed: 24591624
pmcid: 3986200
doi: 10.1073/pnas.1400975111
Kim, S. et al. New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication. Genome Biol. 18, 210 (2017).
pubmed: 29089032
pmcid: 5664825
doi: 10.1186/s13059-017-1341-9
Hulse-Kemp, A. M. et al. Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library. Hortic. Res. 5, 4 (2018).
pubmed: 29423234
pmcid: 5798813
doi: 10.1038/s41438-017-0011-0
Kim, M. S. et al. Comparative analysis of de novo genomes reveals dynamic intra-species divergence of NLRs in pepper. BMC Plant Biol. 21, 247 (2021).
pubmed: 34059006
pmcid: 8166135
doi: 10.1186/s12870-021-03057-8
Liu, F. et al. Genomes of cultivated and wild Capsicum species provide insights into pepper domestication and population differentiation. Nat. Commun. 14, 5487 (2023).
pubmed: 37679363
pmcid: 10484947
doi: 10.1038/s41467-023-41251-4
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
pubmed: 35357919
pmcid: 9186530
doi: 10.1126/science.abj6987
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
pubmed: 35357911
pmcid: 9233505
doi: 10.1126/science.abl4178
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
pubmed: 35357915
pmcid: 9170183
doi: 10.1126/science.abj5089
Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).
pubmed: 35357935
pmcid: 9336181
doi: 10.1126/science.abl3533
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
pubmed: 34762468
pmcid: 10164409
doi: 10.1126/science.abi7489
Hou, X., Wang, D., Cheng, Z., Wang, Y. & Jiao, Y. A near-complete assembly of an Arabidopsis thaliana genome. Mol. Plant 15, 1247–1250 (2022).
pubmed: 35655433
doi: 10.1016/j.molp.2022.05.014
Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with Nanopore and HiFi long reads. Genom. Proteom. Bioinf. 20, 4–13 (2022).
doi: 10.1016/j.gpb.2021.08.003
Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
pubmed: 34171480
doi: 10.1016/j.molp.2021.06.018
Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).
pubmed: 37553831
doi: 10.1016/j.molp.2023.08.003
Yang, X. et al. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol. Plant 16, 314–317 (2023).
pubmed: 36528795
doi: 10.1016/j.molp.2022.12.010
Wang, L. et al. A telomere-to-telomere gap-free assembly of soybean genome. Mol. Plant 16, 1711–1714 (2023).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1–11 (2023).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
pubmed: 33526886
pmcid: 7961889
doi: 10.1038/s41592-020-01056-5
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
pubmed: 27467250
pmcid: 5596920
doi: 10.1016/j.cels.2015.07.012
Zhang, Z. X. et al. Discovery of putative capsaicin biosynthetic genes by RNA-Seq and digital gene expression analysis of pepper. Sci. Rep. 6, 34121 (2016).
pubmed: 27756914
pmcid: 5069471
doi: 10.1038/srep34121
Cleveland, D. W., Mao, Y. & Sullivan, K. F. Centromeres and kinetochores: from epigenetics to mitotic checkpoint signaling. Cell 112, 407–421 (2003).
pubmed: 12600307
doi: 10.1016/S0092-8674(03)00115-6
Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: Rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).
pubmed: 24728646
pmcid: 4036563
doi: 10.1105/tpc.114.123877
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).
pubmed: 37532937
pmcid: 10447253
doi: 10.1038/s41586-023-06389-7
Yang, Z. et al. Cotton D genome assemblies built with long-read data unveil mechanisms of centromere evolution and stress tolerance divergence. BMC Biol. 19, 1–22 (2021).
doi: 10.1186/s12915-021-01041-0
Vitte, C. & Panaud, O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol. Biol. Evol. 20, 528–540 (2003).
pubmed: 12654934
doi: 10.1093/molbev/msg055
Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mobile DNA 2, 4 (2011).
pubmed: 21371312
pmcid: 3059260
doi: 10.1186/1759-8753-2-4
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
pubmed: 37198485
doi: 10.1038/s41586-023-06062-z
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
pubmed: 31727128
pmcid: 6857279
doi: 10.1186/s13059-019-1832-y
Zhu, Z. et al. Natural variations in the MYB transcription factor MYB31 determine the evolution of extremely pungent peppers. New Phytol. 223, 922–938 (2019).
pubmed: 31087356
doi: 10.1111/nph.15853
Sun, B. et al. Coexpression network analysis reveals an MYB transcriptional activator involved in capsaicinoid biosynthesis in hot peppers. Hortic Res. 7, 162 (2020).
pubmed: 33082969
pmcid: 7527512
doi: 10.1038/s41438-020-00381-2
Carrizo, G. C. et al. Phylogenetic relationships, diversification and expansion of chili peppers (Capsicum, Solanaceae). Ann. Bot. 118, 35–51 (2016).
doi: 10.1093/aob/mcw079
Guo, L. et al. The opium poppy genome and morphinan production. Science 362, 343–347 (2018).
pubmed: 30166436
doi: 10.1126/science.aat4096
Huang, A. C. et al. A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science 364, eaau6389 (2019).
pubmed: 31073042
doi: 10.1126/science.aau6389
Nett, R. S., Lau, W. & Sattely, E. S. Discovery and engineering of colchicine alkaloid biosynthesis. Nature 584, 148–153 (2020).
pubmed: 32699417
pmcid: 7958869
doi: 10.1038/s41586-020-2546-8
He, J. et al. Establishing Physalis as a Solanaceae model system enables genetic reevaluation of the inflated calyx syndrome. Plant Cell 35, 351–368 (2023).
pubmed: 36268892
doi: 10.1093/plcell/koac305
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
pubmed: 22652625
doi: 10.1016/j.ymeth.2012.05.001
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
pubmed: 26619908
pmcid: 4665391
doi: 10.1186/s13059-015-0831-x
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
pubmed: 21217122
pmcid: 3051319
doi: 10.1093/bioinformatics/btr011
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
pubmed: 28369201
pmcid: 5870704
doi: 10.1093/bioinformatics/btx153
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
pubmed: 31778144
doi: 10.1093/bioinformatics/btz891
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Chakraborty, M., Baldwin-Brown, J. G., Long, A. D. & Emerson, J. J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147 (2016).
pubmed: 27458204
pmcid: 5100563
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98 (2016).
pubmed: 27467249
pmcid: 5846465
doi: 10.1016/j.cels.2016.07.002
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
pubmed: 28336562
pmcid: 5635820
doi: 10.1126/science.aal3327
Bzikadze, A. V. & Pevzner, P. A. Automated assembly of centromeres from ultra-long error-prone reads. Nat. Biotechnol. 38, 1309–1316 (2020).
pubmed: 32665660
pmcid: 10718184
doi: 10.1038/s41587-020-0582-4
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022).
pubmed: 35365778
pmcid: 10510034
doi: 10.1038/s41592-022-01457-8
Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958 (2018).
pubmed: 29888139
pmcid: 5991294
doi: 10.7717/peerj.4958
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168
pmcid: 2705234
doi: 10.1093/bioinformatics/btp324
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
pubmed: 21221095
pmcid: 3346182
doi: 10.1038/nbt.1754
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
pubmed: 34320186
pmcid: 8476166
doi: 10.1093/molbev/msab199
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
pubmed: 32928274
pmcid: 7488777
doi: 10.1186/s13059-020-02134-9
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
pubmed: 9862982
pmcid: 148217
doi: 10.1093/nar/27.2.573
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4–10 (2009).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
pubmed: 17485477
pmcid: 1933203
doi: 10.1093/nar/gkm286
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
pubmed: 18194517
pmcid: 2253517
doi: 10.1186/1471-2105-9-18
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
pubmed: 29233850
doi: 10.1104/pp.17.01310
Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015).
pubmed: 26206304
pmcid: 4757950
doi: 10.1093/bioinformatics/btv422
Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).
pubmed: 35184178
pmcid: 9002660
doi: 10.1093/hr/uhac017
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
pubmed: 18025269
pmcid: 2134774
doi: 10.1101/gr.6743907
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
pubmed: 25690850
pmcid: 4643835
doi: 10.1038/nbt.3122
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
pubmed: 15713233
pmcid: 553969
doi: 10.1186/1471-2105-6-31
Eilbeck, K., Moore, B., Holt, C. & Yandell, M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67 (2009).
pubmed: 19236712
pmcid: 2653490
doi: 10.1186/1471-2105-10-67
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
pubmed: 15144565
pmcid: 421630
doi: 10.1186/1471-2105-5-59
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform. 3, lqaa108 (2021).
pubmed: 33575650
pmcid: 7787252
doi: 10.1093/nargab/lqaa108
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
pubmed: 25751142
pmcid: 4655817
doi: 10.1038/nmeth.3317
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).
pubmed: 16314312
pmcid: 1298918
doi: 10.1093/nar/gki937
Stanke, M., Tzvetkova, A. & Morgenstern, B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7, S11 (2006).
pmcid: 1810548
doi: 10.1186/gb-2006-7-s1-s11
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
pubmed: 33320174
pmcid: 8289374
doi: 10.1093/bioinformatics/btaa1016
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res 9, 304 (2020).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
pubmed: 19505945
pmcid: 2712344
doi: 10.1093/bioinformatics/btp348
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
pubmed: 24451623
pmcid: 3998144
doi: 10.1093/bioinformatics/btu033
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
pubmed: 17483113
doi: 10.1093/molbev/msm088
Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2021).
pubmed: 33325502
doi: 10.1093/bioinformatics/btaa1022
Tang, H., Krishnakumar, V. & Li, J. jcvi: JCVI utility libraries. Zenodo https://doi.org/10.5281/zenodo.31631 (2015).
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007).
pubmed: 17309896
doi: 10.1093/bioinformatics/btm039
Zhang, Z. et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem. Biophys. Res. Commun. 419, 779–781 (2012).
pubmed: 22390928
doi: 10.1016/j.bbrc.2012.02.101
Servant, N. et al. HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics 28, 2843–2844 (2012).
pubmed: 22923296
pmcid: 3476334
doi: 10.1093/bioinformatics/bts521
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303, 3997 (2013).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
pubmed: 18798982
pmcid: 2592715
doi: 10.1186/gb-2008-9-9-r137
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286
pmcid: 3322381
doi: 10.1038/nmeth.1923
Vollger, M. R. et al. StainedGlass: Interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
pubmed: 35020798
pmcid: 8963321
doi: 10.1093/bioinformatics/btac018
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
pubmed: 21493656
pmcid: 3102221
doi: 10.1093/bioinformatics/btr167
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
pubmed: 27043002
doi: 10.1038/nbt.3519
Chen, W. & Guo, L. Scripts used in ‘Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis’. Zenodo https://doi.org/10.5281/zenodo.11078975 (2024).