Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis.

Capsicum / genetics Capsaicin / metabolism Genome, Plant Telomere / genetics Phylogeny Evolution, Molecular Fruit / genetics Retroelements / genetics Gene Expression Regulation, Plant

Journal

Nature communications

ISSN: 2041-1723

Titre abrégé: Nat Commun

Pays: England

ID NLM: 101528555

Informations de publication

Date de publication:
20 May 2024

Historique:

received: 11 10 2023

accepted: 08 05 2024

medline: 21 5 2024

pubmed: 21 5 2024

entrez: 20 5 2024

Statut: epublish

Résumé

Chili pepper (Capsicum) is known for its unique fruit pungency due to the presence of capsaicinoids. The evolutionary history of capsaicinoid biosynthesis and the mechanism of their tissue specificity remain obscure due to the lack of high-quality Capsicum genomes. Here, we report two telomere-to-telomere (T2T) gap-free genomes of C. annuum and its wild nonpungent relative C. rhomboideum to investigate the evolution of fruit pungency in chili peppers. We precisely delineate Capsicum centromeres, which lack high-copy tandem repeats but are extensively invaded by CRM retrotransposons. Through phylogenomic analyses, we estimate the evolutionary timing of capsaicinoid biosynthesis. We reveal disrupted coding and regulatory regions of key biosynthesis genes in nonpungent species. We also find conserved placenta-specific accessible chromatin regions, which likely allow for tissue-specific biosynthetic gene coregulation and capsaicinoid accumulation. These T2T genomic resources will accelerate chili pepper genetic improvement and help to understand Capsicum genome evolution.

Identifiants

DOI: 10.1038/s41467-024-48643-0 PMID: 38769327

pubmed: 38769327

doi: 10.1038/s41467-024-48643-0

pii: 10.1038/s41467-024-48643-0

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

4295

Informations de copyright

Références

Stewart, C. Jr. et al. The Pun1 gene for pungency in pepper encodes a putative acyltransferase. Plant J. 42, 675–688 (2005).

pubmed: 15918882 doi: 10.1111/j.1365-313X.2005.02410.x

Kim, S. et al. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat. Genet. 46, 270–278 (2014).

pubmed: 24441736 doi: 10.1038/ng.2877

Stewart, C. Jr. et al. Genetic control of pungency in C. chinense via the Pun1 locus. J. Exp. Bot. 58, 979–991 (2007).

pubmed: 17339653 doi: 10.1093/jxb/erl243

Liao, Y. et al. The 3D architecture of the pepper genome and its relationship to function and evolution. Nat. Commun. 13, 3479 (2022).

pubmed: 35710823 pmcid: 9203530 doi: 10.1038/s41467-022-31112-x

Shirasawa, K., Hosokawa, M., Yasui, Y., Toyoda, A. & Isobe, S. Chromosome-scale genome assembly of a Japanese chili pepper landrace, Capsicum annuum ‘Takanotsume’. DNA Res. 30, dsac052 (2023).

pubmed: 36566389 doi: 10.1093/dnares/dsac052

Lee, J. H. et al. High-quality chromosome-scale genomes facilitate effective identification of large structural variations in hot and sweet peppers. Hortic. Res. 9, uhac210 (2022).

pubmed: 36467270 pmcid: 9715575 doi: 10.1093/hr/uhac210

Qin, C. et al. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. USA 111, 5135–5140 (2014).

pubmed: 24591624 pmcid: 3986200 doi: 10.1073/pnas.1400975111

Kim, S. et al. New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication. Genome Biol. 18, 210 (2017).

pubmed: 29089032 pmcid: 5664825 doi: 10.1186/s13059-017-1341-9

Hulse-Kemp, A. M. et al. Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library. Hortic. Res. 5, 4 (2018).

pubmed: 29423234 pmcid: 5798813 doi: 10.1038/s41438-017-0011-0

Kim, M. S. et al. Comparative analysis of de novo genomes reveals dynamic intra-species divergence of NLRs in pepper. BMC Plant Biol. 21, 247 (2021).

pubmed: 34059006 pmcid: 8166135 doi: 10.1186/s12870-021-03057-8

Liu, F. et al. Genomes of cultivated and wild Capsicum species provide insights into pepper domestication and population differentiation. Nat. Commun. 14, 5487 (2023).

pubmed: 37679363 pmcid: 10484947 doi: 10.1038/s41467-023-41251-4

Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

pubmed: 35357919 pmcid: 9186530 doi: 10.1126/science.abj6987

Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).

pubmed: 35357911 pmcid: 9233505 doi: 10.1126/science.abl4178

Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).

pubmed: 35357915 pmcid: 9170183 doi: 10.1126/science.abj5089

Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).

pubmed: 35357935 pmcid: 9336181 doi: 10.1126/science.abl3533

Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).

pubmed: 34762468 pmcid: 10164409 doi: 10.1126/science.abi7489

Hou, X., Wang, D., Cheng, Z., Wang, Y. & Jiao, Y. A near-complete assembly of an Arabidopsis thaliana genome. Mol. Plant 15, 1247–1250 (2022).

pubmed: 35655433 doi: 10.1016/j.molp.2022.05.014

Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with Nanopore and HiFi long reads. Genom. Proteom. Bioinf. 20, 4–13 (2022).

doi: 10.1016/j.gpb.2021.08.003

Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).

pubmed: 34171480 doi: 10.1016/j.molp.2021.06.018

Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).

pubmed: 37553831 doi: 10.1016/j.molp.2023.08.003

Yang, X. et al. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol. Plant 16, 314–317 (2023).

pubmed: 36528795 doi: 10.1016/j.molp.2022.12.010

Wang, L. et al. A telomere-to-telomere gap-free assembly of soybean genome. Mol. Plant 16, 1711–1714 (2023).

Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1–11 (2023).

Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

pubmed: 33526886 pmcid: 7961889 doi: 10.1038/s41592-020-01056-5

Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).

Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

pubmed: 27467250 pmcid: 5596920 doi: 10.1016/j.cels.2015.07.012

Zhang, Z. X. et al. Discovery of putative capsaicin biosynthetic genes by RNA-Seq and digital gene expression analysis of pepper. Sci. Rep. 6, 34121 (2016).

pubmed: 27756914 pmcid: 5069471 doi: 10.1038/srep34121

Cleveland, D. W., Mao, Y. & Sullivan, K. F. Centromeres and kinetochores: from epigenetics to mitotic checkpoint signaling. Cell 112, 407–421 (2003).

pubmed: 12600307 doi: 10.1016/S0092-8674(03)00115-6

Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: Rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).

pubmed: 24728646 pmcid: 4036563 doi: 10.1105/tpc.114.123877

Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).

pubmed: 37532937 pmcid: 10447253 doi: 10.1038/s41586-023-06389-7

Yang, Z. et al. Cotton D genome assemblies built with long-read data unveil mechanisms of centromere evolution and stress tolerance divergence. BMC Biol. 19, 1–22 (2021).

doi: 10.1186/s12915-021-01041-0

Vitte, C. & Panaud, O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol. Biol. Evol. 20, 528–540 (2003).

pubmed: 12654934 doi: 10.1093/molbev/msg055

Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mobile DNA 2, 4 (2011).

pubmed: 21371312 pmcid: 3059260 doi: 10.1186/1759-8753-2-4

Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).

pubmed: 37198485 doi: 10.1038/s41586-023-06062-z

Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

pubmed: 31727128 pmcid: 6857279 doi: 10.1186/s13059-019-1832-y

Zhu, Z. et al. Natural variations in the MYB transcription factor MYB31 determine the evolution of extremely pungent peppers. New Phytol. 223, 922–938 (2019).

pubmed: 31087356 doi: 10.1111/nph.15853

Sun, B. et al. Coexpression network analysis reveals an MYB transcriptional activator involved in capsaicinoid biosynthesis in hot peppers. Hortic Res. 7, 162 (2020).

pubmed: 33082969 pmcid: 7527512 doi: 10.1038/s41438-020-00381-2

Carrizo, G. C. et al. Phylogenetic relationships, diversification and expansion of chili peppers (Capsicum, Solanaceae). Ann. Bot. 118, 35–51 (2016).

doi: 10.1093/aob/mcw079

Guo, L. et al. The opium poppy genome and morphinan production. Science 362, 343–347 (2018).

pubmed: 30166436 doi: 10.1126/science.aat4096

Huang, A. C. et al. A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science 364, eaau6389 (2019).

pubmed: 31073042 doi: 10.1126/science.aau6389

Nett, R. S., Lau, W. & Sattely, E. S. Discovery and engineering of colchicine alkaloid biosynthesis. Nature 584, 148–153 (2020).

pubmed: 32699417 pmcid: 7958869 doi: 10.1038/s41586-020-2546-8

He, J. et al. Establishing Physalis as a Solanaceae model system enables genetic reevaluation of the inflated calyx syndrome. Plant Cell 35, 351–368 (2023).

pubmed: 36268892 doi: 10.1093/plcell/koac305

Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

pubmed: 22652625 doi: 10.1016/j.ymeth.2012.05.001

Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

pubmed: 26619908 pmcid: 4665391 doi: 10.1186/s13059-015-0831-x

Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

pubmed: 21217122 pmcid: 3051319 doi: 10.1093/bioinformatics/btr011

Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).

pubmed: 28369201 pmcid: 5870704 doi: 10.1093/bioinformatics/btx153

Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).

pubmed: 31778144 doi: 10.1093/bioinformatics/btz891

Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

pubmed: 29750242 pmcid: 6137996 doi: 10.1093/bioinformatics/bty191

Chakraborty, M., Baldwin-Brown, J. G., Long, A. D. & Emerson, J. J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147 (2016).

pubmed: 27458204 pmcid: 5100563

Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98 (2016).

pubmed: 27467249 pmcid: 5846465 doi: 10.1016/j.cels.2016.07.002

Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

pubmed: 28336562 pmcid: 5635820 doi: 10.1126/science.aal3327

Bzikadze, A. V. & Pevzner, P. A. Automated assembly of centromeres from ultra-long error-prone reads. Nat. Biotechnol. 38, 1309–1316 (2020).

pubmed: 32665660 pmcid: 10718184 doi: 10.1038/s41587-020-0582-4

Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022).

pubmed: 35365778 pmcid: 10510034 doi: 10.1038/s41592-022-01457-8

Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958 (2018).

pubmed: 29888139 pmcid: 5991294 doi: 10.7717/peerj.4958

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

pubmed: 19451168 pmcid: 2705234 doi: 10.1093/bioinformatics/btp324

Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352

Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

pubmed: 21221095 pmcid: 3346182 doi: 10.1038/nbt.1754

Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

pubmed: 34320186 pmcid: 8476166 doi: 10.1093/molbev/msab199

Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

pubmed: 32928274 pmcid: 7488777 doi: 10.1186/s13059-020-02134-9

Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

pubmed: 9862982 pmcid: 148217 doi: 10.1093/nar/27.2.573

Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4–10 (2009).

Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

pubmed: 17485477 pmcid: 1933203 doi: 10.1093/nar/gkm286

Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).

pubmed: 18194517 pmcid: 2253517 doi: 10.1186/1471-2105-9-18

Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

pubmed: 29233850 doi: 10.1104/pp.17.01310

Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015).

pubmed: 26206304 pmcid: 4757950 doi: 10.1093/bioinformatics/btv422

Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).

pubmed: 35184178 pmcid: 9002660 doi: 10.1093/hr/uhac017

Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).

pubmed: 18025269 pmcid: 2134774 doi: 10.1101/gr.6743907

Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

pubmed: 25690850 pmcid: 4643835 doi: 10.1038/nbt.3122

Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).

pubmed: 15713233 pmcid: 553969 doi: 10.1186/1471-2105-6-31

Eilbeck, K., Moore, B., Holt, C. & Yandell, M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67 (2009).

pubmed: 19236712 pmcid: 2653490 doi: 10.1186/1471-2105-10-67

Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

pubmed: 15144565 pmcid: 421630 doi: 10.1186/1471-2105-5-59

Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform. 3, lqaa108 (2021).

pubmed: 33575650 pmcid: 7787252 doi: 10.1093/nargab/lqaa108

Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

pubmed: 25751142 pmcid: 4655817 doi: 10.1038/nmeth.3317

Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).

pubmed: 16314312 pmcid: 1298918 doi: 10.1093/nar/gki937

Stanke, M., Tzvetkova, A. & Morgenstern, B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7, S11 (2006).

pmcid: 1810548 doi: 10.1186/gb-2006-7-s1-s11

Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).

pubmed: 33320174 pmcid: 8289374 doi: 10.1093/bioinformatics/btaa1016

Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res 9, 304 (2020).

Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

pubmed: 19505945 pmcid: 2712344 doi: 10.1093/bioinformatics/btp348

Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

pubmed: 24451623 pmcid: 3998144 doi: 10.1093/bioinformatics/btu033

Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

pubmed: 17483113 doi: 10.1093/molbev/msm088

Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2021).

pubmed: 33325502 doi: 10.1093/bioinformatics/btaa1022

Tang, H., Krishnakumar, V. & Li, J. jcvi: JCVI utility libraries. Zenodo https://doi.org/10.5281/zenodo.31631 (2015).

Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007).

pubmed: 17309896 doi: 10.1093/bioinformatics/btm039

Zhang, Z. et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem. Biophys. Res. Commun. 419, 779–781 (2012).

pubmed: 22390928 doi: 10.1016/j.bbrc.2012.02.101

Servant, N. et al. HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics 28, 2843–2844 (2012).

pubmed: 22923296 pmcid: 3476334 doi: 10.1093/bioinformatics/bts521

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303, 3997 (2013).

Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

pubmed: 18798982 pmcid: 2592715 doi: 10.1186/gb-2008-9-9-r137

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

pubmed: 22388286 pmcid: 3322381 doi: 10.1038/nmeth.1923

Vollger, M. R. et al. StainedGlass: Interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).

pubmed: 35020798 pmcid: 8963321 doi: 10.1093/bioinformatics/btac018

Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

pubmed: 21493656 pmcid: 3102221 doi: 10.1093/bioinformatics/btr167

Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

pubmed: 27043002 doi: 10.1038/nbt.3519

Chen, W. & Guo, L. Scripts used in ‘Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis’. Zenodo https://doi.org/10.5281/zenodo.11078975 (2024).