Chromosome genome assembly and annotation of Adzuki Bean (Vigna angularis).
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
02 Oct 2024
02 Oct 2024
Historique:
received:
19
06
2023
accepted:
23
09
2024
medline:
3
10
2024
pubmed:
3
10
2024
entrez:
2
10
2024
Statut:
epublish
Résumé
Adzuki bean (Vigna angularis) is a significant dietary legume crop that is prevalent in East Asia. It also holds traditional medicinal importance in China. In this study, we report a high-quality, chromosome-level genome assembly of adzuki bean obtained by employing Illumina short-read sequencing, PacBio long-read sequencing, and Hi-C technology. The assembly spans 447.8 Mb, encompassing 96.32% of the estimated genome, with contig and scaffold N50 values of 16.5 and 41.0 Mb, respectively. More than 98.2% of the 1,614 BUSCO genes were fully identified, and 25,939 genes were annotated, with 98.23% of them being functionally identifiable. Vigna angularis was estimated to diverge successively from Vigna unguiculata and Vigna radiata about 15.3 and 8.7 million years ago (Ma), respectively. This chromosome-level reference genome of Vigna angularis provides a robust foundation for exploring the functional genomics and genome evolution of adzuki bean, thereby facilitating advancements in molecular breeding of adzuki bean.
Identifiants
pubmed: 39358398
doi: 10.1038/s41597-024-03911-y
pii: 10.1038/s41597-024-03911-y
doi:
Types de publication
Journal Article
Dataset
Langues
eng
Sous-ensembles de citation
IM
Pagination
1074Informations de copyright
© 2024. The Author(s).
Références
Xie, Y., Xu, J. H., Lu, W. Y. & Lin, G. Q. Adzuki bean: a new resource of biocatalyst for asymmetric reduction of aromatic ketones with high stereoselectivity and substrate tolerance. Bioresour Technol. 100, 2463–8 (2009).
doi: 10.1016/j.biortech.2008.11.054
pubmed: 19153040
Yook, J. S. et al. Black Adzuki bean (Vigna angularis) attenuates high-fat diet-induced colon inflammation in mice. J Med Food. 20, 367–375 (2017).
doi: 10.1089/jmf.2016.3821
pubmed: 28406732
Chu, L. et al. Genetic analysis of seed coat colour in adzuki bean (Vigna angularis L.). Plant Genet Resour. 19, 67–73 (2021).
doi: 10.1017/S1479262121000101
Xiang, H. et al. Uniconazole foliar spray treatment alleviates cold stress in adzuki bean (Vigna angularis) seedlings. Intl J Agric Biol. 23, 235–240 (2020).
Kramer, C. et al. Control of volunteer adzuki bean in soybean. Agri Sci. 3, 501–509 (2012).
Jameel, M., Al-Khayri, ShriMohan Jain, Dennis V. Johnson. Advances in plant breeding strategies: Legumes. Springer Nature Switzerland AG. Chapter 1 (2019)
Kang, Y. J. et al. Draft genome squence of adzuki bean, Vigna angularis. Sci Rep. 5, 8069 (2015).
doi: 10.1038/srep08069
pubmed: 25626881
pmcid: 5389050
Yamaguchi, H. Wild and weed azuki beans in Japan. Econ Bot. 46, 384–394 (1992).
doi: 10.1007/BF02866509
Sakai, H. et al. The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome. Sci. Rep. 5, 1–13 (2015).
doi: 10.1038/srep16780
Yang, K. et al. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication. Proc. Natl. Acad. Sci. USA 112, 13213–13218 (2015).
doi: 10.1073/pnas.1420949112
pubmed: 26460024
pmcid: 4629392
Chu, L. et al. Chromosome-level reference genome and resequencing of 322 accessions reveal evolution, genomic imprint and key agronomic traits in adzuki bean. Plant Biotechnol. J. https://doi.org/10.1111/pbi.14337 (2024).
doi: 10.1111/pbi.14337
pubmed: 38715243
pmcid: 11332220
Liu, Y. et al. Pan-Genome of Wild and Cultivated Soybeans. Cell 182, 162–176 (2020).
doi: 10.1016/j.cell.2020.05.023
pubmed: 32553274
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764 (2011).
doi: 10.1093/bioinformatics/btr011
pubmed: 21217122
pmcid: 3051319
Sergey, K. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
doi: 10.1101/gr.215087.116
Robert, V. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
doi: 10.1101/gr.214270.116
Bruce, W. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PloS One. 9, 112 (2014).
Roach, M. J. et al. Purge Haplotigs: Synteny Reduction for Third-gen Diploid Genome Assemblies. BMC Bioinformatics. 19, 460 (2018).
doi: 10.1186/s12859-018-2485-7
pubmed: 30497373
pmcid: 6267036
Simao, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
doi: 10.1093/bioinformatics/btv351
pubmed: 26059717
Zhao, X. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
doi: 10.1093/nar/gkm286
Ou, S. J. & Jian, N. LTR_retriever: a highly accurate and sensitive program for identification of 2 long terminal-repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2017).
doi: 10.1104/pp.17.01310
pubmed: 29233850
pmcid: 5813529
Nicolas, S. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
doi: 10.1186/s13059-015-0831-x
Jung, Y. & Han, D. BWA-MEME: BWA-MEM emulated with a machine learning approach. Bioinformatics. 38, 2404–2413 (2022).
doi: 10.1093/bioinformatics/btac137
pubmed: 35253835
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
doi: 10.1016/j.cels.2016.07.002
pubmed: 27467249
pmcid: 5846465
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356, 92–95 (2017).
doi: 10.1126/science.aal3327
pubmed: 28336562
pmcid: 5635820
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21, i351–i358 (2005).
doi: 10.1093/bioinformatics/bti1018
pubmed: 15961478
Tempel, S. Using and Understanding RepeatMasker. Methods Mol Biol. 859, 29–51 (2012).
doi: 10.1007/978-1-61779-603-6_2
pubmed: 22367864
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6, 11 (2015).
doi: 10.1186/s13100-015-0041-9
pubmed: 26045719
pmcid: 4455052
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
doi: 10.1093/nar/27.2.573
pubmed: 9862982
pmcid: 148217
Kim, D., Langmead, B. & Salzberg, S. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 12, 357–360 (2015).
doi: 10.1038/nmeth.3317
pubmed: 25751142
pmcid: 4655817
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 33, 290–295 (2015).
doi: 10.1038/nbt.3122
pubmed: 25690850
pmcid: 4643835
Pertea, M. et al. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 11, 1650–1667 (2016).
doi: 10.1038/nprot.2016.095
pubmed: 27560171
pmcid: 5032908
Gertz, E. M., Yu, Y. K., Agarwala, R., Schäffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol. 4, 41 (2006).
doi: 10.1186/1741-7007-4-41
pubmed: 17156431
pmcid: 1779365
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).
doi: 10.1093/nar/gki458
pubmed: 15980513
pmcid: 1160219
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 268, 78–94 (1997).
doi: 10.1006/jmbi.1997.0951
pubmed: 9149143
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 22, 12, 491 (2011).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2016).
doi: 10.1093/nar/gkw1099
Finn, R. D. et al. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2016).
doi: 10.1093/nar/gkw1107
pubmed: 27899635
pmcid: 5210578
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 11, 41 (2003).
doi: 10.1186/1471-2105-4-41
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338 (2019).
doi: 10.1093/nar/gky1055
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic acids Res. 42, D199–205 (2014).
doi: 10.1093/nar/gkt1076
pubmed: 24214961
Xiang, H. whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JABFOF000000000 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR11787767 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR11787766 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR11787768 (2020).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR11787765 (2020).