The genome assembly and annotation of the cricket Gryllus longicercus.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
28 Jun 2024
Historique:
received: 16 04 2024
accepted: 19 06 2024
medline: 29 6 2024
pubmed: 29 6 2024
entrez: 28 6 2024
Statut: epublish

Résumé

The order Orthoptera includes insects such as grasshoppers, katydids, and crickets, among which there are important species for ecosystem stability and pollination, as well as research organisms in different fields such as neurobiology, ecology, and evolution. Crickets, with more than 2,400 described species, are emerging as novel model research organisms, for their diversity, worldwide distribution, regeneration capacity, and their characteristic acoustic communication. Here we report the assembly and annotation of the first New World cricket, that of Gryllus longicercus Weissman & Gray 2019. The genome assembly, generated by combining 44.54 Gb of long reads from PacBio and 120.44 Gb of short Illumina reads, has a length of 1.85 Gb. The genome annotation yielded 19,715 transcripts from 14,789 gene models.

Identifiants

pubmed: 38942791
doi: 10.1038/s41597-024-03554-z
pii: 10.1038/s41597-024-03554-z
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

708

Informations de copyright

© 2024. The Author(s).

Références

Weissman, D. B. & Gray, D. A. Crickets of the genus Gryllus in the United States (Orthoptera: Gryllidae: Gryllinae). Zootaxa 4705, (2019).
Gray, D. A., Gabel, E., Blankers, T. & Hennig, R. M. Multivariate female preference tests reveal latent perceptual biases. Proc. R. Soc. B Biol. Sci. 283, 20161972 (2016).
doi: 10.1098/rspb.2016.1972
Horch, H. W., Mito, T., Popadic, A., Ohuchi, H., & Noji, S. The Cricket as a Model Organism (Springer 2017).
Mito, T. et al. Cricket: The third domesticated insect. in Current Topics in Developmental Biology vol. 147 291–306 (Academic Press, 2022).
Supple, M. A. & Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biol. 19, 131 (2018).
pubmed: 30205843 pmcid: 6131752 doi: 10.1186/s13059-018-1520-3
Blankers, T., Oh, K. P., Bombarely, A. & Shaw, K. L. The Genomic Architecture of a Rapid Island Radiation: Recombination Rate Variation, Chromosome Structure, and Genome Assembly of the Hawaiian Cricket Laupala. Genetics 209, 1329–1344 (2018).
pubmed: 29875253 pmcid: 6063224 doi: 10.1534/genetics.118.300894
Blankers, T., Oh, K. P., Bombarely, A. & Shaw, K. L. Laupala kohalensis isolate Lakoh051, whole genome shotgun sequencing project. GenBank https://www.ncbi.nlm.nih.gov/nuccore/NNCF00000000.1 (2017).
Pascoal, S. et al. Field cricket genome reveals the footprint of recent, abrupt adaptation in the wild. Evol. Lett. 4, 19–33 (2020).
pubmed: 32055408 doi: 10.1002/evl3.148
Kataoka, K. et al. The Draft Genome Dataset of the Asian Cricket Teleogryllus occipitalis for Molecular Research Toward Entomophagy. Front. Genet. 11, 470 (2020).
pubmed: 32457806 pmcid: 7225344 doi: 10.3389/fgene.2020.00470
Kataoka, K. et al. Teleogryllus occipitalis, whole genome shotgun sequencing project. GenBank http://www.ncbi.nlm.nih.gov/nuccore/BLKR00000000.1 (2020).
Gupta, Y. M. et al. Development of microsatellite markers for the house cricket, Acheta domesticus (Orthoptera: Gryllidae). Biodiversitas J. Biol. Divers. 21, 4094–4099 (2020).
doi: 10.13057/biodiv/d210921
Dossey, A. T. et al. Genome and Genetic Engineering of the House Cricket (Acheta domesticus): A Resource for Sustainable Agriculture. Biomolecules 13, 589 (2023).
pubmed: 37189337 pmcid: 10136058 doi: 10.3390/biom13040589
Dossey, A. T. et al. Acheta domesticus isolate BO2018_Ado_male_adult, whole genome shotgun sequencing project. GenBank https://www.ncbi.nlm.nih.gov/nuccore/JAHLJT000000000.1 (2023).
Ylla, G. et al. Insights into the genomic evolution of insects from cricket genomes. Commun. Biol. 4, 1–12 (2021).
doi: 10.1038/s42003-021-02197-9
Ylla, G. et al. Gryllus bimaculatus strain white eyes, whole genome shotgun sequencing project. GenBank https://www.ncbi.nlm.nih.gov/nuccore/BOPP00000000.1 (2022).
Satoh, A., Takasu, M., Yano, K. & Terai, Y. De novo assembly and annotation of the mangrove cricket genome. BMC Res. Notes 14, 387 (2021).
pubmed: 34627387 pmcid: 8502352 doi: 10.1186/s13104-021-05798-z
Satoh, A., Takasu, M., Yano, K. & Terai, Y. Apteronemobius asahinai, whole genome shotgun sequencing project. GenBank https://www.ncbi.nlm.nih.gov/nuccore/BPSV00000000.1 (2021).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
pubmed: 33526886 pmcid: 7961889 doi: 10.1038/s41592-020-01056-5
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).
pubmed: 35332338 doi: 10.1038/s41587-022-01261-x
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 1–27 (2020).
doi: 10.1186/s13059-020-02134-9
Batut, B. et al. Community-Driven Data Analysis Training for Biology. Cell Syst. 6, 752–758.e1 (2018).
pubmed: 29953864 pmcid: 6296361 doi: 10.1016/j.cels.2018.05.012
Hiltemann, S. et al. Galaxy Training: A powerful framework for teaching! PLoS Comput. Biol. 19, e1010752 (2023).
pubmed: 36622853 pmcid: 9829167 doi: 10.1371/journal.pcbi.1010752
Lariviere, D. et al. VGP assembly pipeline. Galaxy Training Network https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html (2021).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
pubmed: 32188846 pmcid: 7080791 doi: 10.1038/s41467-020-14998-3
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
pubmed: 30497373 pmcid: 6267036 doi: 10.1186/s12859-018-2485-7
Park, B., Choi, E. H. & Hwang, U. W. Gryllus bimaculatus mitochondrion, complete genome. RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_053546.1 (2023).
Torson, A. S., Hicks, A. M. A., Baragar, C. E., Smith, D. & Sinclair, B. J. Gryllus lineaticeps mitochondrion, complete genome. RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_057052.1 (2023).
Torson, A. S., Hicks, A. M. A., Baragar, C. E., Smith, D. & Sinclair, B. J. Gryllus veletis mitochondrion, complete genome. RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_057053.1 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018).
pubmed: 29750242 pmcid: 6137996 doi: 10.1093/bioinformatics/bty191
Lau, M. J. et al. Aedes aegypti isolate YK_2018 mitochondrion, complete genome. GenBank https://www.ncbi.nlm.nih.gov/nuccore/OM214532.1 (2022).
Xiao, B. et al. Blattella germanica mitochondrion, complete genome. RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_012901.1 (2023).
Wan, K. & Celniker, S. Drosophila melanogaster mitochondrion, complete genome. RefSeq https://www.ncbi.nlm.nih.gov/nuccore/NC_024511.2 (2023).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004).
pubmed: 15034147 pmcid: 390337 doi: 10.1093/nar/gkh340
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLOS ONE 5(3), e9490 (2010).
pubmed: 20224823 pmcid: 2835736 doi: 10.1371/journal.pone.0009490
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
pubmed: 34320186 pmcid: 8476166 doi: 10.1093/molbev/msab199
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: Assessing Genomic Data Quality and Beyond. Curr. Protoc. 1, e323 (2021).
pubmed: 34936221 doi: 10.1002/cpz1.323
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
pubmed: 26045719 pmcid: 4455052 doi: 10.1186/s13100-015-0041-9
Smit, A., Hubley, R. & Grenn, P. RepeatMasker Open-4.0 (2015).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinforma. 3, lqaa108 (2021).
doi: 10.1093/nargab/lqaa108
Hoff, K. J., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-Genome Annotation with BRAKER. Methods Mol. Biol. Clifton NJ 1962, 65–95 (2019).
doi: 10.1007/978-1-4939-9173-0_5
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinforma. Oxf. Engl. 32, 767–769 (2016).
doi: 10.1093/bioinformatics/btv661
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
pubmed: 18218656 doi: 10.1093/bioinformatics/btn013
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 1–11 (2006).
doi: 10.1186/1471-2105-7-62
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
pubmed: 25402007 doi: 10.1038/nmeth.3176
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
pubmed: 31842956 pmcid: 6912988 doi: 10.1186/s13059-019-1910-1
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Research 9, ISCB Comm J-304 (2020).
pubmed: 32489650 pmcid: 7222033 doi: 10.12688/f1000research.23297.1
Quinlan, A. R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinforma. 47, 11.12.1–34 (2014).
doi: 10.1002/0471250953.bi1112s47
Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 40, e161 (2012).
pubmed: 22848105 pmcid: 3488211 doi: 10.1093/nar/gks708
Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data. BioRxiv Prepr. Serv. Biol. 2023.01.13.524024 (2023).
Gotoh, O. A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res. 36, 2630–2638 (2008).
pubmed: 18344523 pmcid: 2377433 doi: 10.1093/nar/gkn105
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 51, D445–D451 (2023).
pubmed: 36350662 doi: 10.1093/nar/gkac998
FelixKrueger/TrimGalore: v0.6.10 - add default decompression path. Zenodo https://doi.org/10.5281/zenodo.5127898 (2023).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
pubmed: 31375807 pmcid: 7605509 doi: 10.1038/s41587-019-0201-4
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
pubmed: 2231712 doi: 10.1016/S0022-2836(05)80360-2
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
pubmed: 20003500 pmcid: 2803857 doi: 10.1186/1471-2105-10-421
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
doi: 10.1093/nar/gkac1052
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).
pubmed: 33156333 doi: 10.1093/nar/gkaa977
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinforma. Oxf. Engl. 30, 1236–1240 (2014).
doi: 10.1093/bioinformatics/btu031
BBMap. SourceForge, https://sourceforge.net/projects/bbmap/ (2023).
Szrajer, S., Gray, D. & Ylla, G. Gryllus longicercus isolate DAG 2021-001, whole genome shotgun sequencing project. Genbank https://identifiers.org/ncbi/insdc:JAZDUA000000000.1 (2024).
Szrajer, S., Ylla, G. & Gray, D. The genome assembly and annotation of the cricket Gryllus longicercus. figshare https://doi.org/10.6084/m9.figshare.26003989.v2 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP485514 (2024).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
pubmed: 31727128 pmcid: 6857279 doi: 10.1186/s13059-019-1832-y
Emms, D. M. & Kelly, S. STRIDE: Species Tree Root Inference from Gene Duplication Events. Mol. Biol. Evol. 34, 3267–3278 (2017).
pubmed: 29029342 pmcid: 5850722 doi: 10.1093/molbev/msx259
Emms, D. M. & Kelly, S. STAG: Species Tree Inference from All Genes. Preprint at http://biorxiv.org/lookup/doi/10.1101/267914 (2018).

Auteurs

Szymon Szrajer (S)

Laboratory of Bioinformatics and Genome Biology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Kraków, 30-387, Poland.

David Gray (D)

Department of Biology, California State University Northridge, Northridge, CA, 91330-8303, USA. dave.gray@csun.edu.

Guillem Ylla (G)

Laboratory of Bioinformatics and Genome Biology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Kraków, 30-387, Poland. guillem.ylla@uj.edu.pl.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Animals Tail Swine Behavior, Animal Animal Husbandry

Classifications MeSH