A high-quality genome of the early diverging tychoplanktonic diatom Paralia guyana.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
30 Oct 2024
30 Oct 2024
Historique:
received:
25
03
2024
accepted:
29
08
2024
medline:
31
10
2024
pubmed:
31
10
2024
entrez:
31
10
2024
Statut:
epublish
Résumé
The diatom Paralia guyana is a tychoplanktonic microalgal species that represents one of the early diverging diatoms. P. guyana can thrive in both planktonic and benthic habitats, making a significant contribution to the occurrence of red tide events. Although a dozen diatom genomes have been sequenced, the identity of the early diverging diatoms remains elusive. The understanding of the evolutionary clades and mechanisms of ecological adaptation in P. guyana is limited by the absence of a high-quality genome assembly. In this study, the first high-quality genome assembly for the early diverging diatom P. guyana was established using PacBio single molecular sequencing. The assembled genome has a size of 558.85 Mb, making it the largest diatom genome on record, with a contig N50 size of 26.06 Mb. A total of 27,121 protein-coding genes were predicted in the P. guyana genome, of which 22,904 predicted genes (84.45%) were functionally annotated. This data and analysis provide innovative genomic resources for tychoplanktonic microalgal species and shed light on the evolutionary origins of diatoms.
Identifiants
pubmed: 39477953
doi: 10.1038/s41597-024-03843-7
pii: 10.1038/s41597-024-03843-7
doi:
Types de publication
Journal Article
Dataset
Langues
eng
Sous-ensembles de citation
IM
Pagination
1175Informations de copyright
© 2024. The Author(s).
Références
Falciatore, A., Jaubert, M., Bouly, J.-P., Bailleul, B. & Mock, T. Diatom Molecular Research Comes of Age: Model Species for Studying Phytoplankton Biology and Diversity[OPEN]. The Plant Cell 32, 547–572, https://doi.org/10.1105/tpc.19.00158 (2019).
doi: 10.1105/tpc.19.00158
pubmed: 31852772
pmcid: 7054031
Fu, W. et al. Diatom morphology and adaptation: Current progress and potentials for sustainable development. Sustainable Horizons 2, 100015, https://doi.org/10.1016/j.horiz.2022.100015 (2022).
doi: 10.1016/j.horiz.2022.100015
Tréguer, P. et al. Influence of diatom diversity on the ocean biological carbon pump. Nature Geoscience 11, 27–37, https://doi.org/10.1038/s41561-017-0028-x (2018).
doi: 10.1038/s41561-017-0028-x
Treguer, P. et al. The silica balance in the world ocean: a reestimate. Science 268, 375–379, https://doi.org/10.1126/science.268.5209.375 (1995).
doi: 10.1126/science.268.5209.375
pubmed: 17746543
Guiry, M. D. How Many Species of Algae Are There? Journal of phycology 48, 1057–1063, https://doi.org/10.1111/j.1529-8817.2012.01222.x (2012).
doi: 10.1111/j.1529-8817.2012.01222.x
pubmed: 27011267
Nakov, T., Beaulieu, J. M. & Alverson, A. J. Accelerated diversification is related to life history and locomotion in a hyperdiverse lineage of microbial eukaryotes (Diatoms, Bacillariophyta). The New phytologist 219, 462–473, https://doi.org/10.1111/nph.15137 (2018).
doi: 10.1111/nph.15137
pubmed: 29624698
pmcid: 6099383
Armbrust, E. V. et al. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306, 79–86, https://doi.org/10.1126/science.1101156 (2004).
doi: 10.1126/science.1101156
pubmed: 15459382
Filloramo, G. V., Curtis, B. A., Blanche, E. & Archibald, J. M. Re-examination of two diatom reference genomes using long-read sequencing. BMC genomics 22, 379, https://doi.org/10.1186/s12864-021-07666-3 (2021).
doi: 10.1186/s12864-021-07666-3
pubmed: 34030633
pmcid: 8147415
Bowler, C. et al. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature 456, 239–244, https://doi.org/10.1038/nature07410 (2008).
doi: 10.1038/nature07410
pubmed: 18923393
Lommer, M. et al. Genome and low-iron response of an oceanic diatom adapted to chronic iron limitation. Genome biology 13, R66, https://doi.org/10.1186/gb-2012-13-7-r66 (2012).
doi: 10.1186/gb-2012-13-7-r66
pubmed: 22835381
pmcid: 3491386
Tanaka, T. et al. Oil accumulation by the oleaginous diatom Fistulifera solaris as revealed by the genome and transcriptome. Plant Cell 27, 162–176, https://doi.org/10.1105/tpc.114.135194 (2015).
doi: 10.1105/tpc.114.135194
pubmed: 25634988
pmcid: 4330590
Liu, S., Xu, Q. & Chen, N. Expansion of photoreception-related gene families may drive ecological adaptation of the dominant diatom species Skeletonema marinoi. The Science of the total environment 897, 165384, https://doi.org/10.1016/j.scitotenv.2023.165384 (2023).
doi: 10.1016/j.scitotenv.2023.165384
pubmed: 37422237
Li, L. et al. The Draft Genome of the Centric Diatom Conticribra weissflogii (Coscinodiscophyceae, Ochrophyta). Protist 172, 125845, https://doi.org/10.1016/j.protis.2021.125845 (2021).
doi: 10.1016/j.protis.2021.125845
pubmed: 34916152
Kaczmarska, I. & Ehrman, J. M. Auxosporulation in Paralia guyana MacGillivary (Bacillariophyta) and Possible New Insights into the Habit of the Earliest Diatoms. PLoS One 10, e0141150, https://doi.org/10.1371/journal.pone.0141150 (2015).
doi: 10.1371/journal.pone.0141150
pubmed: 26485144
pmcid: 4618869
Liu, H. et al. Phytoplankton communities and its controlling factors in summer and autumn in the southern Yellow Sea, China. Acta Oceanologica Sinica 34, 114–123, https://doi.org/10.1007/s13131-015-0620-0 (2015).
doi: 10.1007/s13131-015-0620-0
Guillard, R. R. & Ryther, J. H. Studies of marine planktonic diatoms. I. Cyclotella nana Hustedt, and Detonula confervacea (cleve) Gran. Can J Microbiol 8, 229–239, https://doi.org/10.1139/m62-029 (1962).
doi: 10.1139/m62-029
pubmed: 13902807
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, 1–6, https://doi.org/10.1093/gigascience/gix120 (2018).
doi: 10.1093/gigascience/gix120
pubmed: 29659813
pmcid: 5827348
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
doi: 10.1093/bioinformatics/btr011
pubmed: 21217122
pmcid: 3051319
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
doi: 10.1093/bioinformatics/btx153
pubmed: 28369201
pmcid: 5870704
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).
doi: 10.1038/s41467-020-14998-3
pubmed: 32188846
Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature methods 10, 563–569 (2013).
doi: 10.1038/nmeth.2474
pubmed: 23644548
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
doi: 10.1038/s41592-020-01056-5
pubmed: 33526886
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460, https://doi.org/10.1186/s12859-018-2485-7 (2018).
doi: 10.1186/s12859-018-2485-7
pubmed: 30497373
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–268, https://doi.org/10.1093/nar/gkm286 (2007).
doi: 10.1093/nar/gkm286
pubmed: 17485477
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
doi: 10.1073/pnas.1921046117
pubmed: 32300014
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
doi: 10.1186/s13100-015-0041-9
pubmed: 26045719
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
doi: 10.1093/bioinformatics/bti1018
pubmed: 15961478
Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC bioinformatics 7, 62, https://doi.org/10.1186/1471-2105-7-62 (2006).
doi: 10.1186/1471-2105-7-62
pubmed: 16469098
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
doi: 10.1186/1471-2105-5-59
pubmed: 15144565
pmcid: 421630
Hongo, Y. et al. The genome of the diatom Chaetoceros tenuissimus carries an ancient integrated fragment of an extant virus. Sci Rep 11, 22877, https://doi.org/10.1038/s41598-021-00565-3 (2021).
doi: 10.1038/s41598-021-00565-3
pubmed: 34819553
pmcid: 8613185
Oliver, A. et al. Diploid genomic architecture of Nitzschia inconspicua, an elite biomass production diatom. Scientific Reports 11, 15592, https://doi.org/10.1038/s41598-021-95106-3 (2021).
doi: 10.1038/s41598-021-95106-3
pubmed: 34341414
pmcid: 8329260
Osuna-Cruz, C. M. et al. The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms. Nature Communications 11, 3320, https://doi.org/10.1038/s41467-020-17191-8 (2020).
doi: 10.1038/s41467-020-17191-8
pubmed: 32620776
pmcid: 7335047
Roberts, W. R., Downey, K. M., Ruck, E. C., Traller, J. C. & Alverson, A. J. Improved Reference Genome for Cyclotella cryptica CCMP332, a Model for Cell Wall Morphogenesis, Salinity Adaptation, and Lipid Production in Diatoms (Bacillariophyta). G3 Genes|Genomes|Genetics 10, 2965–2974, https://doi.org/10.1534/g3.120.401408 (2020).
doi: 10.1534/g3.120.401408
pubmed: 32709619
pmcid: 7466962
Kent, W. J. BLAT–the BLAST-like alignment tool. Genome Res 12, 656–664, https://doi.org/10.1101/gr.229202 (2002).
doi: 10.1101/gr.229202
pubmed: 11932250
pmcid: 187518
Yang, Z. et al. Convergent horizontal gene transfer and cross-talk of mobile nucleic acids in parasitic plants. Nature Plants 5, 991–1001, https://doi.org/10.1038/s41477-019-0458-0 (2019).
doi: 10.1038/s41477-019-0458-0
pubmed: 31332314
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).
doi: 10.1186/s13059-019-1910-1
pubmed: 31842956
pmcid: 6912988
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics 12, 491, https://doi.org/10.1186/1471-2105-12-491 (2011).
doi: 10.1186/1471-2105-12-491
pubmed: 22192575
pmcid: 3280279
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Research 51, D418–D427, https://doi.org/10.1093/nar/gkac993 (2022).
doi: 10.1093/nar/gkac993
pmcid: 9825450
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).
doi: 10.1038/nmeth.3176
pubmed: 25402007
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).
doi: 10.1038/75556
pubmed: 10802651
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular biology and evolution 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
doi: 10.1093/molbev/msab199
pubmed: 34320186
Ncbi Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28125664 (2024).
Ncbi Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28125665 (2024).
Jian, J. J. et al. A high-quality genome of the early diverging tychoplanktonic diatom Paralia guyana. figshare https://doi.org/10.6084/m9.figshare.25310971 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_041146295.1 (2024).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
doi: 10.1093/bioinformatics/bty191
pubmed: 29750242
pmcid: 6137996