Improved chromosome-level genome assembly of Indian sandalwood (Santalum album).
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
21 Dec 2023
21 Dec 2023
Historique:
received:
01
08
2023
accepted:
12
12
2023
medline:
22
12
2023
pubmed:
22
12
2023
entrez:
21
12
2023
Statut:
epublish
Résumé
Santalum album is a well-known aromatic and medicinal plant that is highly valued for the essential oil (EO) extracted from its heartwood. In this study, we present a high-quality chromosome-level genome assembly of S. album after integrating PacBio Sequel, Illumina HiSeq paired-end and high-throughput chromosome conformation capture sequencing technologies. The assembled genome size is 207.39 M with a contig N50 of 7.33 M and scaffold N50 size of 18.31 M. Compared with three previously published sandalwood genomes, the N50 length of the genome assembly was longer. In total, 94.26% of the assembly was assigned to 10 pseudo-chromosomes, and the anchor rate far exceeded that of a recently released value. BUSCO analysis yielded a completeness score of 94.91%. In addition, we predicted 23,283 protein-coding genes, 89.68% of which were functionally annotated. This high-quality genome will provide a foundation for sandalwood functional genomics studies, and also for elucidating the genetic basis of EO biosynthesis in S. album.
Identifiants
pubmed: 38129455
doi: 10.1038/s41597-023-02849-x
pii: 10.1038/s41597-023-02849-x
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
921Subventions
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 32171841, 32371925, 31870666
Informations de copyright
© 2023. The Author(s).
Références
Harbaugh, D. T. & Baldwin, B. G. Phylogeny and biogeography of the sandalwoods (Santalum, Santalaceae) repeated dispersals throughout the Pacific. Am. J. Bot. 94, 1028–1040 (2007).
pubmed: 21636472
doi: 10.3732/ajb.94.6.1028
Moniodis, J. et al. The transcriptome of sesquiterpenoid biosynthesis in heartwood xylem of Western Australian sandalwood (Santalum spicatum). Phytochemistry 113, 79–86 (2015).
pubmed: 25624157
doi: 10.1016/j.phytochem.2014.12.009
Zhang, X. H., Teixeira da Silva, J. A., Yan, J. & Ma, G. H. Essential oils composition from roots of Santalum album L. J. Essent. Oil Bear. Pl. 15, 1–6 (2012).
doi: 10.1080/0972060X.2012.10644011
Teixeira da Silva, J. A. et al. Sandalwood: basic biology, tissue culture, and genetic transformation. Planta 243, 847–887 (2016).
pubmed: 26745967
doi: 10.1007/s00425-015-2452-8
Mahesh, H. B. & Gowda, M. In The Sandalwood Genome: Compendium of Plant Genomes (Gowda, M. et al. (eds.), 1–5 (Springer Nature Switzerland press, 2022).
Burdock, G. A. & Carabin, I. G. Safety assessment of sandalwood oil (Santalum album L.). Food Chem. Toxicol. 46, 421–432 (2008).
pubmed: 17980948
doi: 10.1016/j.fct.2007.09.092
Kim, T. H. et al. Antifungal and ichthyotoxic sesquiterpenoids from Santalum album heartwood. Molecules 22, 1139 (2017).
pubmed: 28698478
pmcid: 6152050
doi: 10.3390/molecules22071139
Bommareddy, A. et al. Medicinal properties of alpha-santalol, a naturally occurring constituent of sandalwood oil: review. Nat. Prod. Res. 33, 527–543 (2019).
pubmed: 29130352
doi: 10.1080/14786419.2017.1399387
Kumar, A. N. A., Joshi, G. & Ram, H. Y. M. Sandalwood: history, uses, present status and the future. Curr. Sci. 103, 1408–1416 (2012).
Tropical Forestry Services (TPS). TFS Sandalwood Project 2015, Indian Sandalwood. Product Disclosure Statement. Tropical Forestry Services Ltd., 169 Broadway, Nedlands WA 6009, Australia (2015).
Baldovini, N., Delasalle, C. & Joulain, D. Phytochemistry of the heartwood from fragrant Santalum species: a review. Flavour Frag. J. 26, 7–26 (2011).
doi: 10.1002/ffj.2025
Jones, C. G. et al. Sandalwood fragrance biosynthesis involves sesquiterpene synthases of both the terpene synthase (TPS)-a and TPS-b subfamilies, including santalene synthases. J. Biol. Chem. 286, 17445–17454 (2011).
pubmed: 21454632
pmcid: 3093818
doi: 10.1074/jbc.M111.231787
Diaz-Chavez, M. L. et al. Biosynthesis of sandalwood oil: Santalum album CYP76F cytochromes P450 produce santalols and bergamotol. PLoS One 8, e75053 (2013).
pubmed: 24324844
pmcid: 3854609
doi: 10.1371/journal.pone.0075053
Celedon, J. M. et al. Heartwood-specific transcriptome and metabolite signatures of tropical sandalwood (Santalum album) reveal the final step of (Z)-santalol fragrance biosynthesis. Plant J. 86, 289–299 (2016).
pubmed: 26991058
doi: 10.1111/tpj.13162
Niu, M. Y. et al. Cloning and expression analysis of mevalonate kinase and phosphomevalonate kinase genes associated with the MVA pathway in Santalum album. Sci. Rep. 11, 16913 (2021).
pubmed: 34413433
pmcid: 8376994
doi: 10.1038/s41598-021-96511-4
Niu, M. Y. et al. Cloning, characterization, and functional analysis of acetyl-CoA C-acetyltransferase and 3-hydroxy-3-methylglutaryl-CoA synthase genes in Santalum album. Sci. Rep. 11, 1082 (2021).
pubmed: 33441887
pmcid: 7807033
doi: 10.1038/s41598-020-80268-3
Mahesh, H. B. et al. Multi-omics driven assembly and annotation of the sandalwood (Santalum album) genome. Plant Physiol. 176, 2772–2788 (2018).
pubmed: 29440596
pmcid: 5884603
doi: 10.1104/pp.17.01764
Dasgupta, M. G., Ulaganathan, K., Dev, S. A. & Balakrishnan, S. Draft genome of Santalum album L. provides genomic resources for accelerated trait improvement. Tree Genet. Genomes 15, 34 (2019).
doi: 10.1007/s11295-019-1334-9
Hong, Z. et al. Chromosome-level genome assemblies from two sandalwood species provide insights into the evolution of the Santalales. Commun Biol 6, 587 (2023).
pubmed: 37264116
pmcid: 10235099
doi: 10.1038/s42003-023-04980-2
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
pubmed: 26059717
doi: 10.1093/bioinformatics/btv351
Bennetzen, J. L. & Wang, H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505–530 (2014).
pubmed: 24579996
doi: 10.1146/annurev-arplant-050213-035811
Porebski, S., Bailey, L. G. & Baum, B. R. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol. Biol. Rep. 15, 8–15 (1997).
doi: 10.1007/BF02772108
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
pubmed: 28298431
pmcid: 5411767
doi: 10.1101/gr.215087.116
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
pubmed: 21217122
pmcid: 3051319
doi: 10.1093/bioinformatics/btr011
Chen, Y. X. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, 1–6 (2018).
pubmed: 29659813
pmcid: 5827348
doi: 10.1093/gigascience/gix120
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
pubmed: 28369201
pmcid: 5870704
doi: 10.1093/bioinformatics/btx153
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286
pmcid: 3322381
doi: 10.1038/nmeth.1923
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
pubmed: 26619908
pmcid: 4665391
doi: 10.1186/s13059-015-0831-x
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
pubmed: 27467249
pmcid: 5846465
doi: 10.1016/j.cels.2016.07.002
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
pubmed: 28336562
pmcid: 5635820
doi: 10.1126/science.aal3327
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168
pmcid: 2705234
doi: 10.1093/bioinformatics/btp324
Zhang, X. H. et al. Identification and functional characterization of three new terpene synthase genes involved in chemical defense and abiotic stresses in Santalum album. BMC Plant Biol. 19, 115 (2019).
pubmed: 30922222
pmcid: 6437863
doi: 10.1186/s12870-019-1720-3
Kolosova, N. et al. Isolation of high-quality RNA from gymnosperm and angiosperm trees. Biotechniques 36, 821–824 (2004).
pubmed: 15152602
doi: 10.2144/04365ST06
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
pubmed: 17485477
pmcid: 1933203
doi: 10.1093/nar/gkm286
Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1), i152–i158 (2005).
pubmed: 15961452
doi: 10.1093/bioinformatics/bti1003
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–i358 (2005).
pubmed: 15961478
doi: 10.1093/bioinformatics/bti1018
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-3.0. 1996–2010. (2010).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
pubmed: 15123596
pmcid: 479130
doi: 10.1101/gr.1865504
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
pubmed: 16845043
pmcid: 1538822
doi: 10.1093/nar/gkl200
Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).
pubmed: 18974171
pmcid: 2720775
doi: 10.1093/bioinformatics/btn564
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 25, 78–94 (1997).
doi: 10.1006/jmbi.1997.0951
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
pubmed: 25751142
pmcid: 4655817
doi: 10.1038/nmeth.3317
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
pubmed: 25690850
pmcid: 4643835
doi: 10.1038/nbt.3122
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
pubmed: 18190707
pmcid: 2395244
doi: 10.1186/gb-2008-9-1-r7
Lowe, T. M. & Eddy, S. R. tRNAscan-SE a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
pubmed: 9023104
pmcid: 146525
doi: 10.1093/nar/25.5.955
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).
pubmed: 15608160
doi: 10.1093/nar/gki081
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
pubmed: 31727128
pmcid: 6857279
doi: 10.1186/s13059-019-1832-y
NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA009778/CRX582846 (2023).
NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA009778/CRX582847 (2023).
NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA009778/CRX582848 (2023).
NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA009778/CRX582849 (2023).
Zhang, X. H. et al. Santalum album TX1, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_034195605.1 (2023).
Zhang, X. H. et al. Improved chromosome-level genome assembly of Indian sandalwood (Santalum album). figshare https://doi.org/10.6084/m9.figshare.23694729.v1 (2023).