A high-quality genome of the early diverging tychoplanktonic diatom Paralia guyana.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
30 Oct 2024
Historique:
received: 25 03 2024
accepted: 29 08 2024
medline: 31 10 2024
pubmed: 31 10 2024
entrez: 31 10 2024
Statut: epublish

Résumé

The diatom Paralia guyana is a tychoplanktonic microalgal species that represents one of the early diverging diatoms. P. guyana can thrive in both planktonic and benthic habitats, making a significant contribution to the occurrence of red tide events. Although a dozen diatom genomes have been sequenced, the identity of the early diverging diatoms remains elusive. The understanding of the evolutionary clades and mechanisms of ecological adaptation in P. guyana is limited by the absence of a high-quality genome assembly. In this study, the first high-quality genome assembly for the early diverging diatom P. guyana was established using PacBio single molecular sequencing. The assembled genome has a size of 558.85 Mb, making it the largest diatom genome on record, with a contig N50 size of 26.06 Mb. A total of 27,121 protein-coding genes were predicted in the P. guyana genome, of which 22,904 predicted genes (84.45%) were functionally annotated. This data and analysis provide innovative genomic resources for tychoplanktonic microalgal species and shed light on the evolutionary origins of diatoms.

Identifiants

pubmed: 39477953
doi: 10.1038/s41597-024-03843-7
pii: 10.1038/s41597-024-03843-7
doi:

Types de publication

Journal Article Dataset

Langues

eng

Sous-ensembles de citation

IM

Pagination

1175

Informations de copyright

© 2024. The Author(s).

Références

Falciatore, A., Jaubert, M., Bouly, J.-P., Bailleul, B. & Mock, T. Diatom Molecular Research Comes of Age: Model Species for Studying Phytoplankton Biology and Diversity[OPEN]. The Plant Cell 32, 547–572, https://doi.org/10.1105/tpc.19.00158 (2019).
doi: 10.1105/tpc.19.00158 pubmed: 31852772 pmcid: 7054031
Fu, W. et al. Diatom morphology and adaptation: Current progress and potentials for sustainable development. Sustainable Horizons 2, 100015, https://doi.org/10.1016/j.horiz.2022.100015 (2022).
doi: 10.1016/j.horiz.2022.100015
Tréguer, P. et al. Influence of diatom diversity on the ocean biological carbon pump. Nature Geoscience 11, 27–37, https://doi.org/10.1038/s41561-017-0028-x (2018).
doi: 10.1038/s41561-017-0028-x
Treguer, P. et al. The silica balance in the world ocean: a reestimate. Science 268, 375–379, https://doi.org/10.1126/science.268.5209.375 (1995).
doi: 10.1126/science.268.5209.375 pubmed: 17746543
Guiry, M. D. How Many Species of Algae Are There? Journal of phycology 48, 1057–1063, https://doi.org/10.1111/j.1529-8817.2012.01222.x (2012).
doi: 10.1111/j.1529-8817.2012.01222.x pubmed: 27011267
Nakov, T., Beaulieu, J. M. & Alverson, A. J. Accelerated diversification is related to life history and locomotion in a hyperdiverse lineage of microbial eukaryotes (Diatoms, Bacillariophyta). The New phytologist 219, 462–473, https://doi.org/10.1111/nph.15137 (2018).
doi: 10.1111/nph.15137 pubmed: 29624698 pmcid: 6099383
Armbrust, E. V. et al. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306, 79–86, https://doi.org/10.1126/science.1101156 (2004).
doi: 10.1126/science.1101156 pubmed: 15459382
Filloramo, G. V., Curtis, B. A., Blanche, E. & Archibald, J. M. Re-examination of two diatom reference genomes using long-read sequencing. BMC genomics 22, 379, https://doi.org/10.1186/s12864-021-07666-3 (2021).
doi: 10.1186/s12864-021-07666-3 pubmed: 34030633 pmcid: 8147415
Bowler, C. et al. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature 456, 239–244, https://doi.org/10.1038/nature07410 (2008).
doi: 10.1038/nature07410 pubmed: 18923393
Lommer, M. et al. Genome and low-iron response of an oceanic diatom adapted to chronic iron limitation. Genome biology 13, R66, https://doi.org/10.1186/gb-2012-13-7-r66 (2012).
doi: 10.1186/gb-2012-13-7-r66 pubmed: 22835381 pmcid: 3491386
Tanaka, T. et al. Oil accumulation by the oleaginous diatom Fistulifera solaris as revealed by the genome and transcriptome. Plant Cell 27, 162–176, https://doi.org/10.1105/tpc.114.135194 (2015).
doi: 10.1105/tpc.114.135194 pubmed: 25634988 pmcid: 4330590
Liu, S., Xu, Q. & Chen, N. Expansion of photoreception-related gene families may drive ecological adaptation of the dominant diatom species Skeletonema marinoi. The Science of the total environment 897, 165384, https://doi.org/10.1016/j.scitotenv.2023.165384 (2023).
doi: 10.1016/j.scitotenv.2023.165384 pubmed: 37422237
Li, L. et al. The Draft Genome of the Centric Diatom Conticribra weissflogii (Coscinodiscophyceae, Ochrophyta). Protist 172, 125845, https://doi.org/10.1016/j.protis.2021.125845 (2021).
doi: 10.1016/j.protis.2021.125845 pubmed: 34916152
Kaczmarska, I. & Ehrman, J. M. Auxosporulation in Paralia guyana MacGillivary (Bacillariophyta) and Possible New Insights into the Habit of the Earliest Diatoms. PLoS One 10, e0141150, https://doi.org/10.1371/journal.pone.0141150 (2015).
doi: 10.1371/journal.pone.0141150 pubmed: 26485144 pmcid: 4618869
Liu, H. et al. Phytoplankton communities and its controlling factors in summer and autumn in the southern Yellow Sea, China. Acta Oceanologica Sinica 34, 114–123, https://doi.org/10.1007/s13131-015-0620-0 (2015).
doi: 10.1007/s13131-015-0620-0
Guillard, R. R. & Ryther, J. H. Studies of marine planktonic diatoms. I. Cyclotella nana Hustedt, and Detonula confervacea (cleve) Gran. Can J Microbiol 8, 229–239, https://doi.org/10.1139/m62-029 (1962).
doi: 10.1139/m62-029 pubmed: 13902807
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, 1–6, https://doi.org/10.1093/gigascience/gix120 (2018).
doi: 10.1093/gigascience/gix120 pubmed: 29659813 pmcid: 5827348
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
doi: 10.1093/bioinformatics/btr011 pubmed: 21217122 pmcid: 3051319
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
doi: 10.1093/bioinformatics/btx153 pubmed: 28369201 pmcid: 5870704
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).
doi: 10.1038/s41467-020-14998-3 pubmed: 32188846
Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature methods 10, 563–569 (2013).
doi: 10.1038/nmeth.2474 pubmed: 23644548
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
doi: 10.1038/s41592-020-01056-5 pubmed: 33526886
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460, https://doi.org/10.1186/s12859-018-2485-7 (2018).
doi: 10.1186/s12859-018-2485-7 pubmed: 30497373
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–268, https://doi.org/10.1093/nar/gkm286 (2007).
doi: 10.1093/nar/gkm286 pubmed: 17485477
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
doi: 10.1073/pnas.1921046117 pubmed: 32300014
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
doi: 10.1186/s13100-015-0041-9 pubmed: 26045719
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
doi: 10.1093/bioinformatics/bti1018 pubmed: 15961478
Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC bioinformatics 7, 62, https://doi.org/10.1186/1471-2105-7-62 (2006).
doi: 10.1186/1471-2105-7-62 pubmed: 16469098
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
doi: 10.1186/1471-2105-5-59 pubmed: 15144565 pmcid: 421630
Hongo, Y. et al. The genome of the diatom Chaetoceros tenuissimus carries an ancient integrated fragment of an extant virus. Sci Rep 11, 22877, https://doi.org/10.1038/s41598-021-00565-3 (2021).
doi: 10.1038/s41598-021-00565-3 pubmed: 34819553 pmcid: 8613185
Oliver, A. et al. Diploid genomic architecture of Nitzschia inconspicua, an elite biomass production diatom. Scientific Reports 11, 15592, https://doi.org/10.1038/s41598-021-95106-3 (2021).
doi: 10.1038/s41598-021-95106-3 pubmed: 34341414 pmcid: 8329260
Osuna-Cruz, C. M. et al. The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms. Nature Communications 11, 3320, https://doi.org/10.1038/s41467-020-17191-8 (2020).
doi: 10.1038/s41467-020-17191-8 pubmed: 32620776 pmcid: 7335047
Roberts, W. R., Downey, K. M., Ruck, E. C., Traller, J. C. & Alverson, A. J. Improved Reference Genome for Cyclotella cryptica CCMP332, a Model for Cell Wall Morphogenesis, Salinity Adaptation, and Lipid Production in Diatoms (Bacillariophyta). G3 Genes|Genomes|Genetics 10, 2965–2974, https://doi.org/10.1534/g3.120.401408 (2020).
doi: 10.1534/g3.120.401408 pubmed: 32709619 pmcid: 7466962
Kent, W. J. BLAT–the BLAST-like alignment tool. Genome Res 12, 656–664, https://doi.org/10.1101/gr.229202 (2002).
doi: 10.1101/gr.229202 pubmed: 11932250 pmcid: 187518
Yang, Z. et al. Convergent horizontal gene transfer and cross-talk of mobile nucleic acids in parasitic plants. Nature Plants 5, 991–1001, https://doi.org/10.1038/s41477-019-0458-0 (2019).
doi: 10.1038/s41477-019-0458-0 pubmed: 31332314
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).
doi: 10.1186/s13059-019-1910-1 pubmed: 31842956 pmcid: 6912988
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics 12, 491, https://doi.org/10.1186/1471-2105-12-491 (2011).
doi: 10.1186/1471-2105-12-491 pubmed: 22192575 pmcid: 3280279
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Research 51, D418–D427, https://doi.org/10.1093/nar/gkac993 (2022).
doi: 10.1093/nar/gkac993 pmcid: 9825450
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).
doi: 10.1038/nmeth.3176 pubmed: 25402007
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).
doi: 10.1038/75556 pubmed: 10802651
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular biology and evolution 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
doi: 10.1093/molbev/msab199 pubmed: 34320186
Ncbi Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28125664 (2024).
Ncbi Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28125665 (2024).
Jian, J. J. et al. A high-quality genome of the early diverging tychoplanktonic diatom Paralia guyana. figshare https://doi.org/10.6084/m9.figshare.25310971 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_041146295.1 (2024).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
doi: 10.1093/bioinformatics/bty191 pubmed: 29750242 pmcid: 6137996

Auteurs

Jianbo Jian (J)

Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou, 515063, China.
Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark.
BGI Genomics, Shenzhen, China.

Feichao Du (F)

Laboratory of Marine Organism Taxonomy and Phylogeny, Qingdao Key Laboratory of Marine Biodiversity and Conservation, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.

Binhu Wang (B)

BGI Genomics, Shenzhen, China.

Xiaodong Fang (X)

BGI Genomics, Shenzhen, China.

Thomas Ostenfeld Larsen (TO)

Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark.

Yuhang Li (Y)

Laboratory of Marine Organism Taxonomy and Phylogeny, Qingdao Key Laboratory of Marine Biodiversity and Conservation, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China. liyuhang@qdio.ac.cn.

Eva C Sonnenschein (EC)

Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark. e.c.sonnenschein@swansea.ac.uk.
Department of Biosciences, Faculty of Science and Engineering, Swansea University, Swansea, Wales, UK. e.c.sonnenschein@swansea.ac.uk.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family

Classifications MeSH