Chromosome-level genome assembly and functional annotation of Citrullus colocynthis: unlocking genetic resources for drought-resilient crop development.
Comparative genomics
Desert plant
Drought tolerance
Gene annotation
Genetic variation
Wild crop relatives
Journal
Planta
ISSN: 1432-2048
Titre abrégé: Planta
Pays: Germany
ID NLM: 1250576
Informations de publication
Date de publication:
23 Oct 2024
23 Oct 2024
Historique:
received:
01
04
2024
accepted:
11
10
2024
medline:
24
10
2024
pubmed:
24
10
2024
entrez:
23
10
2024
Statut:
epublish
Résumé
The chromosome-level genome assembly of Citrullus colocynthis reveals its genetic potential for enhancing drought tolerance, paving the way for innovative crop improvement strategies. This study presents the first comprehensive genome assembly and annotation of Citrullus colocynthis, a drought-tolerant wild close relative of cultivated watermelon, highlighting its potential for enhancing agricultural resilience to climate change. The study achieved a chromosome-level assembly using advanced sequencing technologies, including PacBio HiFi and Hi-C, revealing a genome size of approximately 366 Mb with low heterozygosity and substantial repetitive content. Our analysis identified 23,327 gene models, that could encode stress response mechanisms for species' adaptation to arid environments. Comparative genomics with closely related species illuminated the evolutionary dynamics within the Cucurbitaceae family. In addition, resequencing of 27 accessions from the United Arab Emirates (UAE) identified genetic diversity, suggesting a foundation for future breeding programs. This genomic resource opens new avenues for the de novo domestication of C. colocynthis, offering a blueprint for developing crops with enhanced drought tolerance, disease resistance, and nutritional profiles, crucial for sustaining future food security in the face of escalating climate challenges.
Identifiants
pubmed: 39443340
doi: 10.1007/s00425-024-04551-7
pii: 10.1007/s00425-024-04551-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
124Informations de copyright
© 2024. The Author(s).
Références
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
pubmed: 2231712
doi: 10.1016/S0022-2836(05)80360-2
Al-Snafi AE (2016) Chemical constituents and pharmacological effects of Citrullus colocynthis—A review. IOSR J Pharm 6(3):57–67
Ashburner M et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29. https://doi.org/10.1038/75556
doi: 10.1038/75556
pubmed: 10802651
pmcid: 3037419
Assis JG et al (2000) Implications of the introgression between Citrullus colocynthis and C. lanatus characters in the taxonomy, evolutionary dynamics and breeding of watermelon. Pl Genet Resources Newslett. 121:15–19
Badr A, Zaki H (2024) Genetic diversity of Citrullus colocynthis populations using phytochemical analysis and SCoT marker variations. Genet Resour Crop Evol 71:2341–2353
doi: 10.1007/s10722-023-01783-6
Badr A et al (2018) Genetic diversity of colocynth (Citrullus colocynthis Schrader) populations in the eastern desert of egypt as revealed by morphological variation and ISSR polymorphism. Feddes Repertorium 129:173–184
doi: 10.1002/fedr.201700011
Bao G, Church GM (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276
pubmed: 12176934
pmcid: 186642
doi: 10.1101/gr.88502
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580
pubmed: 9862982
pmcid: 148217
doi: 10.1093/nar/27.2.573
Berwal MK et al (2022) The bioactive compounds and fatty acid profile of bitter apple seed oil obtained in hot Arid Environments. Horticulturae. 8:259
doi: 10.3390/horticulturae8030259
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
pubmed: 15980510
pmcid: 1160247
doi: 10.1093/nar/gki487
Bigdelo M et al (2017) Evaluation of bitter apple (Citrullus colocynthis (L.) Schrad) as potential rootstock for watermelon. Aust J Crop Sci 11:727–732
doi: 10.21475/ajcs.17.11.06.p492
Bikdeloo M et al (2021) Morphological and physio-biochemical responses of watermelon grafted onto rootstocks of wild watermelon [Citrullus colocynthis (L.) Schrad] and commercial interspecific cucurbita hybrid to drought stress. Horticulturae. 7(10):359
doi: 10.3390/horticulturae7100359
Bohra A et al (2022) Reap the crop wild relatives for breeding future crops. Trends Biotechnol 40:412–431
pubmed: 34629170
doi: 10.1016/j.tibtech.2021.08.009
Borgi Z, Hibar K, Boughalleb N, Jabari H (2009) Evaluation of four local colocynth accessions and four hybrids, used as watermelon rootstocks, for resistance to fusarium wilt and fusarium crown and root rot. Afr J Plant Sci Biotechnol 3:37–40
Buchfink B, Reuter K, Drost HG (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368
pubmed: 33828273
pmcid: 8026399
doi: 10.1038/s41592-021-01101-x
Cantalapiedra CP et al (2021) eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 38:5825–5829. https://doi.org/10.1093/molbev/msab293
doi: 10.1093/molbev/msab293
pubmed: 34597405
pmcid: 8662613
Challis R et al (2020) BlobToolKit—interactive quality assessment of genome assemblies. G3 Genes Genomes Genetics. 10:1361–1374
pubmed: 32071071
pmcid: 7144090
doi: 10.1534/g3.119.400908
Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890
pubmed: 30423086
pmcid: 6129281
doi: 10.1093/bioinformatics/bty560
Cheng H et al (2021) Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18:170–175. https://doi.org/10.1038/s41592-020-01056-5
doi: 10.1038/s41592-020-01056-5
pubmed: 33526886
pmcid: 7961889
Chomicki G, Renner SS (2015) Watermelon origin solved with molecular phylogenetics including Linnaean material: another example of museomics. New Phytol 205:526–532
pubmed: 25358433
doi: 10.1111/nph.13163
Cingolani P et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6:80–92. https://doi.org/10.4161/fly.19695
doi: 10.4161/fly.19695
pubmed: 22728672
pmcid: 3679285
Consortium T.U (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49:D480–D489
doi: 10.1093/nar/gkaa1100
Conway J et al (2017) UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33(18):2938–2940. https://doi.org/10.1093/bioinformatics/btx364
doi: 10.1093/bioinformatics/btx364
pubmed: 28645171
pmcid: 5870712
Coordinators NCBIR (2014) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42:D7–D17
doi: 10.1093/nar/gkt1146
Council NR (2006) Lost Crops of Africa: Volume II: Vegetables. The National Academies Press, Washington.
Dane F, Liu J, Zhang C (2007) Phylogeography of the Bitter Apple, Citrullus Colocynthis. Genet Resour Crop Evol 54:327–336
doi: 10.1007/s10722-005-4897-2
DeMaere MZ, Darling AE (2021) qc3C: reference-free quality control for Hi-C sequencing data. PLoS Comput Biol 17:1–20
doi: 10.1371/journal.pcbi.1008839
Durand NC et al (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3:95–98. https://doi.org/10.1016/j.cels.2016.07.002
doi: 10.1016/j.cels.2016.07.002
pubmed: 27467249
pmcid: 5846465
El-Gebali S et al (2019) The Pfam protein families database in 2019. Nucleic Acids Res 47:D427–D432. https://doi.org/10.1093/nar/gky995
doi: 10.1093/nar/gky995
pubmed: 30357350
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157
pubmed: 26243257
pmcid: 4531804
doi: 10.1186/s13059-015-0721-2
Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238
pubmed: 31727128
pmcid: 6857279
doi: 10.1186/s13059-019-1832-y
Emms DM, Kelly S (2017) STRIDE: species tree root inference from gene duplication events. Mol Biol Evol 34(12):3267–3278
pubmed: 29029342
pmcid: 5850722
doi: 10.1093/molbev/msx259
Emms DM, Kelly S (2018) STAG: Species Tree Inference from All Genes. bioRxiv. p. 267914.
Fernie AR, Yan J (2019) De novo domestication: an alternative route toward new crops for the future. Mol Plant 12:615–631
pubmed: 30999078
doi: 10.1016/j.molp.2019.03.016
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. https://doi.org/10.1093/nar/gkr367
doi: 10.1093/nar/gkr367
pubmed: 21593126
pmcid: 3125773
Flynn JM et al (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci 117:9451–9457. https://doi.org/10.1073/pnas.1921046117
doi: 10.1073/pnas.1921046117
pubmed: 32300014
pmcid: 7196820
Fukasawa Y et al (2020) LongQC: a quality control tool for third generation sequencing long read data. G3 Genes Genomes Genetics. 10:1193–1196
pubmed: 32041730
pmcid: 7144081
doi: 10.1534/g3.119.400864
Gasparini K, Moreira JDR, Peres LEP, Zsögön A (2021) De novo domestication of wild species to create crops with increased resilience and nutritional value. Curr Opin Plant Biol 60:102006
pubmed: 33556879
doi: 10.1016/j.pbi.2021.102006
Gkanogiannis A (2023) fastreeR: phylogenetic, distance and other calculations on VCF and Fasta files. Bioconductor. https://doi.org/10.18129/B9.bioc.fastreeR
doi: 10.18129/B9.bioc.fastreeR
Gonzalez-Garay ML (2016) Introduction to isoform sequencing using pacific biosciences technology (Iso-Seq). In: Wu J (ed) Transcriptomics and gene regulation. Springer, Dordrecht, pp 141–160
doi: 10.1007/978-94-017-7450-5_6
Guo S et al (2012) The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat Genet. https://doi.org/10.1038/ng.2470
doi: 10.1038/ng.2470
pubmed: 23242369
pmcid: 4169232
Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. https://doi.org/10.1093/bioinformatics/btt086
doi: 10.1093/bioinformatics/btt086
pubmed: 23422339
pmcid: 3624806
Hanssen F et al (2024) Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery. NAR Genom Bioinf 6(2):lqae031
doi: 10.1093/nargab/lqae031
Howe K et al (2021) Significantly improving the quality of genome assemblies through curation. Gigascience. 10:giaa153
pubmed: 33420778
pmcid: 7794651
doi: 10.1093/gigascience/giaa153
Huerta-Cepas J et al (2019) eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314
pubmed: 30418610
doi: 10.1093/nar/gky1085
Hussain AI et al (2014) Citrullus colocynthis (L.) Schrad (bitter apple fruit): a review of its phytochemistry, pharmacology, traditional uses and nutritional potential. J Ethnopharmacol 155:54–66
pubmed: 24936768
doi: 10.1016/j.jep.2014.06.011
Hyatt D et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119. https://doi.org/10.1186/1471-2105-11-119
doi: 10.1186/1471-2105-11-119
Jones P et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
pubmed: 24451626
pmcid: 3998142
doi: 10.1093/bioinformatics/btu031
Kelly S, Maini PK (2013) DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments. PLoS ONE 8:e58537
pubmed: 23554899
pmcid: 3598851
doi: 10.1371/journal.pone.0058537
Kokot M, Długosz M, Deorowicz S (2017) KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33:2759–2761
pubmed: 28472236
doi: 10.1093/bioinformatics/btx304
Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
pubmed: 11237011
doi: 10.1038/35057062
Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127–128
pubmed: 17050570
doi: 10.1093/bioinformatics/btl529
Levi A et al (2017) Genetic diversity in the desert watermelon Citrullus colocynthis and its relationship with Citrullus species as determined by high-frequency oligonucleotides-targeting active gene markers. J. Am. Soc. Hort. Sci. 142(1):47–56
doi: 10.21273/JASHS03834-16
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Li H (2021) New strategies to improve minimap2 alignment accuracy. Bioinformatics 37:4572–4574
pubmed: 34623391
pmcid: 8652018
doi: 10.1093/bioinformatics/btab705
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
pubmed: 19451168
pmcid: 2705234
doi: 10.1093/bioinformatics/btp324
Lieberman-Aiden E et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293
pubmed: 19815776
pmcid: 2858594
doi: 10.1126/science.1181369
Li KP et al (2016) Cytogenetic relationships among Citrullus species in comparison with some genera of the tribe Benincaseae (Cucurbitaceae) as inferred from rDNA distribution patterns. BMC Evol Biol 16:85
pubmed: 27090090
pmcid: 4835933
doi: 10.1186/s12862-016-0656-6
Mariod AA, Jarret RL (2022) Chapter 12—Antioxidant, antimicrobial, and antidiabetic activities of Citrullus colocynthis seed oil. Multiple biological activities of unconventional seed oils. Academic Press, New York, pp 139–146. https://doi.org/10.1016/b978-0-12-824135-6.00005-2
doi: 10.1016/b978-0-12-824135-6.00005-2
Mazher M et al (2024) Evaluation of genetic diversity and population structure of Citrullus colocynthis based on physiochemical and inter simple sequence repeat (ISSR) markers. Genet Resour Crop Evol. https://doi.org/10.1007/s10722-024-01913-8
doi: 10.1007/s10722-024-01913-8
Meslier V et al (2022) Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Scientific Data 9:694
pubmed: 36369227
pmcid: 9652401
doi: 10.1038/s41597-022-01762-z
Ogundele JO, Oshodi AA, Amoo IA (2012) Comparative Study of Amino Acid and Proximate Composition of Citrullus colocynthis and Citrullus vulgaris Seeds. Pak J Nutr 11:247–251
doi: 10.3923/pjn.2012.247.251
Palmer JM (2020) Funannotate v1.8.1: a fungal genome annotation and comparative genomics pipeline. Zenodo. https://doi.org/10.5281/zenodo.4054262 . Accessed Aug 2023
Patro R et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. https://doi.org/10.1038/nmeth.4197
doi: 10.1038/nmeth.4197
pubmed: 28263959
pmcid: 5600148
Pimentel D et al (1997) Economic and environmental benefits of biodiversity. Bioscience 47:747–757
doi: 10.2307/1313097
Porebski S, Bailey LG, Baum BR (1997) Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Report 15:8–15
doi: 10.1007/BF02772108
Ranallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 11:1432
pubmed: 32188846
pmcid: 7080791
doi: 10.1038/s41467-020-14998-3
Renner SS et al (2021) A chromosome-level genome of a Kordofan melon illuminates the origin of domesticated watermelons. Proc Natl Acad Sci 118:e2101486118
pubmed: 34031154
pmcid: 8201767
doi: 10.1073/pnas.2101486118
Renzi JP et al (2022) How could the use of crop wild relatives in breeding increase the adaptation of crops to marginal environments? Front Plant Sci. https://doi.org/10.3389/fpls.2022.1101822
doi: 10.3389/fpls.2022.1101822
pubmed: 36531413
pmcid: 9755750
Rhie A, Walenz BP, Koren S, Phillippy AM (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245
pubmed: 32928274
pmcid: 7488777
doi: 10.1186/s13059-020-02134-9
Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genom Proteom Bioinform 13:278–289
doi: 10.1016/j.gpb.2015.08.002
Robinson JT et al (2018) Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst 6:256-258.e1
pubmed: 29428417
pmcid: 6047755
doi: 10.1016/j.cels.2018.01.001
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
pubmed: 3447015
Sawaya WN, Daghir NJ, Khalil JK (1986) Citrullus colocynthis seeds as a potential source of protein for food and feed. J Agric Food Chem 34:285–288
doi: 10.1021/jf00068a035
Sawaya WN, Daghir NJ, Khan P (1983) Chemical characterization and edibility of the oil extracted from Citrullus colocynthis seeds. J Food Sci 48:104–106
doi: 10.1111/j.1365-2621.1983.tb14799.x
Seppey M, Manni M, Zdobnov EM (2019) BUSCO: assessing genome assembly and annotation completeness. In: Kollmar M (ed) Gene prediction: methods and protocols. Springer, New York, pp 227–245
Si Y et al (2010) Cloning and expression analysis of the Ccrboh gene encoding respiratory burst oxidase in Citrullus colocynthis and grafting onto Citrullus lanatus (watermelon). J Exp Bot 61:1635–1642
pubmed: 20181664
pmcid: 2852657
doi: 10.1093/jxb/erq031
Smit AFA, Hubley R, Green P (2013) RepeatMasker Open-4.0. http://www.repeatmasker.org . Accessed Aug 2023
Stanke M et al (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435–W439
pubmed: 16845043
pmcid: 1538822
doi: 10.1093/nar/gkl200
Steinegger M, Söding J (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35:1026–1028
pubmed: 29035372
doi: 10.1038/nbt.3988
Tyack N, Dempewolf H, Khoury CK (2020) The potential of payment for ecosystem services for crop wild relative conservation. Plants. 9(10):1305
pubmed: 33023207
pmcid: 7601374
doi: 10.3390/plants9101305
Van der Auwera GA, O'Connor BD (2020) Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O'Reilly Media.
Verma KS et al (2017) RAPD and ISSR marker assessment of genetic diversity in Citrullus colocynthis (L.) Schrad: a unique source of germplasm highly adapted to drought and high-temperature stress. 3 Biotech 7(5):288. https://doi.org/10.1007/s13205-017-0918-z
doi: 10.1007/s13205-017-0918-z
pubmed: 28868215
pmcid: 5570720
Wang Z et al (2014) Analysis of the Citrullus colocynthis transcriptome during water deficit stress. PLoS ONE 9:e104657
pubmed: 25118696
pmcid: 4132101
doi: 10.1371/journal.pone.0104657
Wenger AM et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162
pubmed: 31406327
pmcid: 6776680
doi: 10.1038/s41587-019-0217-9
Xie M et al (2019) A reference-grade wild soybean genome. Nat Commun 10:1216
pubmed: 30872580
pmcid: 6418295
doi: 10.1038/s41467-019-09142-9
Yao W et al (2015) Exploring the rice dispensable genome using a metagenome-like assembly strategy. Genome Biol 16:187
pubmed: 26403182
pmcid: 4583175
doi: 10.1186/s13059-015-0757-3
Zhou C, McCarthy SA, Durbin R (2023) YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 39:btac808
pubmed: 36525368
doi: 10.1093/bioinformatics/btac808