The phased telomere-to-telomere reference genome of Musa acuminata, a main contributor to banana cultivars.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
16 09 2023
16 09 2023
Historique:
received:
06
06
2023
accepted:
05
09
2023
medline:
18
9
2023
pubmed:
17
9
2023
entrez:
16
9
2023
Statut:
epublish
Résumé
Musa acuminata is a main wild contributor to banana cultivars. Here, we reported a haplotype-resolved and telomere-to-telomere reference genome of M. acuminata by incorporating PacBio HiFi reads, Nanopore ultra-long reads, and Hi-C data. The genome size of the two haploid assemblies was estimated to be 469.83 Mb and 470.21 Mb, respectively. Multiple assessments confirmed the contiguity (contig N50: 16.53 Mb and 18.58 Mb; LAI: 20.18 and 19.48), completeness (BUSCOs: 98.57% and 98.57%), and correctness (QV: 45.97 and 46.12) of the genome. The repetitive sequences accounted for about half of the genome size. In total, 40,889 and 38,269 protein-coding genes were annotated in the two haploid assemblies, respectively, of which 9.56% and 3.37% were newly predicted. Genome comparison identified a large reciprocal translocation involving 3 Mb and 10 Mb from chromosomes 01 and 04 within M. acuminata. This reference genome of M. acuminata provides a valuable resource for further understanding of subgenome evolution of Musa species, and precise genetic improvement of banana.
Identifiants
pubmed: 37716992
doi: 10.1038/s41597-023-02546-9
pii: 10.1038/s41597-023-02546-9
pmc: PMC10505225
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
631Subventions
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 32070237, 31261140366
Informations de copyright
© 2023. Springer Nature Limited.
Références
Brozynska, M., Furtado, A. & Henry, R. J. Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol. J. 14, 1070–1085 (2016).
pubmed: 26311018
Bohra, A. et al. Reap the crop wild relatives for breeding future crops. Trends Biotechnol. 40, 412–431 (2022).
pubmed: 34629170
Castaneda-Alvarez, N. P. et al. Global conservation priorities for crop wild relatives. Nat. Plants 2, 16022 (2016).
pubmed: 27249561
Perrier, X. et al. Multidisciplinary perspectives on banana (Musa spp.) domestication. Proc. Natl. Acad. Sci. USA 108, 11311–11318 (2011).
pubmed: 21730145
pmcid: 3136277
Davey, M. W. et al. A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids. BMC Genom. 14, 683 (2013).
Perrier, X. et al. Combining biological approaches to shed light on the evolution of edible bananas. Ethnobot. Res. App. 7, 199–216 (2009).
Shepherd K. Cytogenetics Of The Genus Musa (International Network for the Improvement of Banana and Plantain, 1999).
Hippolyte, I. et al. A saturated SSR/DarT linkage map of Musa acuminata addressing genome rearrangements among bananas. BMC Plant Biol. 10, 65 (2010).
pubmed: 20388207
pmcid: 2923539
Martin, G. et al. Evolution of the banana genome (Musa acuminata) is impacted by large chromosomal translocations. Mol. Biol. Evol. 34, 2140–2152 (2017).
pubmed: 28575404
pmcid: 5850475
Dupouy, M. et al. Two large reciprocal translocations characterized in the disease resistance-rich burmannica genetic group of Musa acuminata. Ann. Bot. 124, 319–329 (2019).
pubmed: 31241133
pmcid: 6758587
Martin, G. et al. Chromosome reciprocal translocations have accompanied subspecies evolution in bananas. Plant J. 104, 1698–1711 (2020).
pubmed: 33067829
pmcid: 7839431
D’Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217 (2012).
pubmed: 22801500
Belser, C. et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 4, 1047 (2021).
pubmed: 34493830
pmcid: 8423783
Hu, G. et al. Two divergent haplotypes from a highly heterozygous lychee genome suggest independent domestication events for early and late-maturing cultivars. Nat. Genet. 54, 73–83 (2022).
pubmed: 34980919
pmcid: 8755541
Sun, X. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 52, 1423–1432 (2020).
pubmed: 33139952
pmcid: 7728601
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
pubmed: 21217122
pmcid: 3051319
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
pubmed: 32188846
pmcid: 7080791
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
doi: 10.1093/bioinformatics/bty560
pubmed: 30423086
pmcid: 6129281
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
pubmed: 33526886
pmcid: 7961889
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).
pubmed: 31661016
pmcid: 6816165
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
pubmed: 27467249
pmcid: 5846465
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
pubmed: 28336562
pmcid: 5635820
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
pubmed: 27467250
pmcid: 5596920
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
pubmed: 31778144
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
pubmed: 22517427
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
pubmed: 23422339
pmcid: 3624806
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
pubmed: 19541911
pmcid: 2752132
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168
pmcid: 2705234
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
pubmed: 31375807
pmcid: 7605509
Barnett, D. W., Garrison, E. K., Quinlan, A. R., Stromberg, M. P. & Marth, G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692 (2011).
pubmed: 21493652
pmcid: 3106182
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
pubmed: 29233850
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
pubmed: 34320186
pmcid: 8476166
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome biol. 21, 245 (2020).
pubmed: 32928274
pmcid: 7488777
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
pubmed: 31843001
pmcid: 6913007
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
pubmed: 32300014
pmcid: 7196820
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
pubmed: 26045719
pmcid: 4455052
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinformatics 48, 4.11.11–14.11.39 (2014).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
pubmed: 10592178
pmcid: 102476
Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
pubmed: 34597405
pmcid: 8662613
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
pubmed: 30418610
Melters, D. P. et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 14, R10 (2013).
pubmed: 23363705
pmcid: 4053949
Shi, X. et al. The complete reference genome for grapevine (Vitis vinifera L.) genetics and breeding. Hortic. Res. 10, uhad061 (2023).
pubmed: 37213686
pmcid: 10199708
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
pubmed: 9862982
pmcid: 148217
Marcais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
pubmed: 29373581
pmcid: 5802927
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
pubmed: 31842948
pmcid: 6913012
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
pubmed: 18436778
Chen, C. et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol. Plant 13, 1194–1202 (2020).
pubmed: 32585190
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP435127 (2023).
Liu, X. et al. Musa acuminata subsp. malaccensis genome assembly. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030219345.1 (2023).
Liu, X. et al. The phased telomere-to-telomere reference genome of Musa acuminata, a main contributor to banana cultivars. Figshare https://doi.org/10.6084/m9.figshare.22716271.v9 (2023).