Genome sequence of Kobresia littledalei, the first chromosome-level genome in the family Cyperaceae.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
11 06 2020
11 06 2020
Historique:
received:
22
11
2019
accepted:
07
05
2020
entrez:
13
6
2020
pubmed:
13
6
2020
medline:
5
11
2020
Statut:
epublish
Résumé
Kobresia plants are important forage resources in the Qinghai-Tibet Plateau and are essential in maintaining the ecological balance of grasslands. Therefore, it is beneficial to obtain Kobresia genome resources and study the adaptive characteristics of Kobresia plants in the Qinghai-Tibetan Plateau. We assembled the genome of Kobresia littledalei C. B. Clarke, which was about 373.85 Mb in size. 96.82% of the bases were attached to 29 pseudo-chromosomes, combining PacBio, Illumina and Hi-C sequencing data. Additional investigation of the annotation identified 23,136 protein-coding genes. 98.95% of these were functionally annotated. According to phylogenetic analysis, K. littledalei in Cyperaceae separated from Poaceae about 97.6 million years ago after separating from Ananas comosus in Bromeliaceae about 114.3mya. For K. littledalei, we identified a high-quality genome at the chromosome level. This is the first time a reference genome has been established for a species of Cyperaceae. This genome will help additional studies focusing on the processes of plant adaptation to environments with high altitude and cold weather.
Identifiants
pubmed: 32528014
doi: 10.1038/s41597-020-0518-3
pii: 10.1038/s41597-020-0518-3
pmc: PMC7289886
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
175Références
Magallón, S., Gómez-Acevedo, S., Sánchez-Reyes, L. L. & Hernández-Hernández, T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol. 207, 437–453 (2015).
pubmed: 25615647
pmcid: 25615647
doi: 10.1111/nph.13264
Xiao, Y., Xiao, Z., Ma, D., Liu, J. & Li, J. Genome sequence of the barred knifejaw Oplegnathus fasciatus (Temminck & Schlegel, 1844): the first chromosome-level draft genome in the family Oplegnathidae. GigaScience. 8, 21–22 (2019).
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 13, 1050–1054 (2016).
pubmed: 27749838
pmcid: 5503144
doi: 10.1038/nmeth.4035
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 9, e112963 (2014).
pubmed: 25409509
pmcid: 4237348
doi: 10.1371/journal.pone.0112963
Roach, M. J., Schmidt, S. & Borneman, A. R. Purge Haplotigs: synteny reduction for third-gen diploid genome assemblies. BMC Bioinformatics. 19, 460 (2018).
pubmed: 30497373
pmcid: 6267036
doi: 10.1186/s12859-018-2485-7
Zhang, D.-C. et al. Chromosome-level genome assembly of golden pompano (Trachinotus ovatus) in the family Carangidae. Scientific Data. 6, 216 (2019).
pubmed: 31641137
pmcid: 6805935
doi: 10.1038/s41597-019-0238-8
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Research. 4, 35–36 (2015).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
pubmed: 24185095
pmcid: 24185095
doi: 10.1038/nbt.2727
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
pubmed: 4665391
pmcid: 4665391
doi: 10.1186/s13059-015-0831-x
Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 198 (2015).
pubmed: 26392354
pmcid: 4576377
doi: 10.1186/s13059-015-0767-1
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 25, 4.10.11–14.10.14 (2009).
doi: 10.1002/0471250953.bi0410s25
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 6, 11 (2015).
pubmed: 26045719
pmcid: 4455052
doi: 10.1186/s13100-015-0041-9
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
pubmed: 17485477
pmcid: 17485477
doi: 10.1093/nar/gkm286
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21, 351–358 (2005).
doi: 10.1093/bioinformatics/bti1018
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
pubmed: 148217
pmcid: 148217
doi: 10.1093/nar/27.2.573
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 7, 62 (2006).
pubmed: 16469098
pmcid: 1409804
doi: 10.1186/1471-2105-7-62
Pertea, M., Salzberg, S. L. & Majoros, W. H. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20, 2878–2879 (2004).
pubmed: 15145805
doi: 10.1093/bioinformatics/bth315
Korf, I. Gene finding in novel genomes. BMC Bioinformatics. 5, 59 (2004).
pubmed: 421630
pmcid: 421630
doi: 10.1186/1471-2105-5-59
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics. Chapter 4, Unit 4.3 (2007).
pubmed: 18428791
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
pubmed: 9149143
doi: 10.1006/jmbi.1997.0951
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
pubmed: 2395244
pmcid: 2395244
doi: 10.1186/gb-2008-9-1-r7
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, 158–169 (2016).
doi: 10.1093/nar/gkw1099
Morishima, K., Tanabe, M., Furumichi, M., Kanehisa, M. & Sato, Y. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, 353–361 (2016).
Bateman, A. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, 211–215 (2008).
Varshney, R. K. et al. Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat. Biotechnol. 35, 969–976 (2017).
pubmed: 28922347
pmcid: 6871012
doi: 10.1038/nbt.3943
Zou, C. et al. The genome of broomcorn millet. Nature Commun. 10, 436 (2019).
doi: 10.1038/s41467-019-08409-5
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
pubmed: 30297971
doi: 10.1038/s41588-018-0237-2
Ming, R. et al. The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47, 1435–1442 (2015).
pubmed: 26523774
pmcid: 4867222
doi: 10.1038/ng.3435
Matasci, N. et al. Data access for the 1,000 Plants (1KP) project. GigaScience. 3, 17 (2014).
pubmed: 25625010
pmcid: 4306014
doi: 10.1186/2047-217X-3-17
Bateman, A. et al. Pfam: the protein families database. Nucleic Acids Res. 42, 222–230 (2013).
Mitchell, A. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 30, 1236–1240 (2014).
pubmed: 24451626
pmcid: 3998142
doi: 10.1093/bioinformatics/btu031
Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 11, 431 (2010).
pubmed: 20718988
pmcid: 2931519
doi: 10.1186/1471-2105-11-431
Consortium, T. G. O. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, 1049–1056 (2014).
doi: 10.1093/nar/gku1179
Conesa, A. & Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int. J. Plant Genomics. 2008, 12 (2008).
doi: 10.1155/2008/619832
Lipnerova, I., Bures, P., Horova, L. & Smarda, P. Evolution of genome size in Carex (Cyperaceae) in relation to chromosome number and genomic base composition. Ann. Bot-London. 111, 79–94 (2012).
doi: 10.1093/aob/mcs239
VanBuren, R. et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature. 527, 508–511 (2015).
pubmed: 26560029
doi: 10.1038/nature15714
Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688–2690 (2006).
pubmed: 16928733
doi: 10.1093/bioinformatics/btl446
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
pubmed: 17483113
doi: 10.1093/molbev/msm088
Tang, H. et al. Synteny and collinearity in plant genomes. Science. 320, 486–488 (2008).
pubmed: 18436778
doi: 10.1126/science.1153917
Paterson, A. H., Bowers, J. E. & Chapman, B. A. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101, 9903 (2004).
pubmed: 15161969
doi: 10.1073/pnas.0307901101
NCBI Sequence Read Archive https://identifiers.org/insdc.sra:SRP198441 (2020).
Qu, G. Carex littledalei isolate C.B.Clarke, whole genome shotgun sequencing project. Genbank https://identifiers.org/ncbi/insdc:SWLB00000000 (2020).
Qu, G. Genome sequence of Kobresia littledalei, the first chromosome-level genome in the family Cyperaceae. figshare https://doi.org/10.6084/m9.figshare.12197544.v1 (2020).
Parra, G., Korf, I. & Bradnam, K. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23, 1061–1067 (2007).
doi: 10.1093/bioinformatics/btm071
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
pubmed: 3571712
pmcid: 3571712
doi: 10.1038/nbt.1883
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126–e126 (2018).
pubmed: 30107434
pmcid: 6265445
Kriventseva, E. V., Zdobnov, E. M., Simão, F. A., Ioannidis, P. & Waterhouse, R. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
pubmed: 26059717
doi: 10.1093/bioinformatics/btv351
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 1, 18 (2012).
pubmed: 23587118
pmcid: 23587118
doi: 10.1186/2047-217X-1-18