ModEst: Accurate estimation of genome size from next generation sequencing data.
Biodiversity genomics
Next generation sequencing
comparative genomics
generation sequencing
genome trait
Journal
Molecular ecology resources
ISSN: 1755-0998
Titre abrégé: Mol Ecol Resour
Pays: England
ID NLM: 101465604
Informations de publication
Date de publication:
May 2022
May 2022
Historique:
revised:
29
11
2021
received:
19
07
2021
accepted:
30
11
2021
pubmed:
10
12
2021
medline:
7
4
2022
entrez:
9
12
2021
Statut:
ppublish
Résumé
Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. Here we present a fast, easy-to-implement and accurate method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take advantage of the fact that an accurate estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the sequencing depth distribution representing the true underlying distribution. With simulations we show that reasonable genome size estimates can be gained even from low-coverage (10×), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results.
Identifiants
pubmed: 34882987
doi: 10.1111/1755-0998.13570
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1454-1464Subventions
Organisme : LOEWE Translational Biodiversity Genomics Centre
Informations de copyright
© 2021 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
Références
Agudo, A. B., Torices, R., Loureiro, J., Castro, S., Castro, M., & Álvarez, I. (2019). Genome size variation in a hybridizing diploid species complex in Anacyclus (Asteraceae: Anthemideae). International Journal of Plant Sciences, 180, 374-385.
Asalone, K. C., Ryan, K. M., Yamadi, M., Cohen, A. L., Farmer, W. G., George, D. J., Joppert, C., Kim, K., Mughal, M. F., Said, R., Toksoz-Exley, M., Bisk, E., & Bracht, J. R. (2020). Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Computational Biology, 16, e1008104. https://doi.org/10.1371/journal.pcbi.1008104
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., Pyshkin, A. V., Sirotkin, A. V., Vyahhi, N., Tesler, G., Alekseyev, M. A., & Pevzner, P. A. (2012). SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 19, 455-477. https://doi.org/10.1089/cmb.2012.0021
Bennett, M. D., & Leitch, I. J. (2005). Genome size evolution in plants. In T. R. Gregory ed., The evolution of the genome (pp. 89-162). Elsevier.
Blommaert, J., Riss, S., Hecox-Lea, B., Mark Welch, D. B., & Stelzer, C. P. (2019). Small, but surprisingly repetitive genomes: transposon expansion and not polyploidy has driven a doubling in genome size in a metazoan species complex. BMC Genomics, 20, 466. https://doi.org/10.1186/s12864-019-5859-y
Böhning, D., & Schön, D. (2005). Nonparametric maximum likelihood estimation of population size based on the counting distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54, 721-737.
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114-2120. https://doi.org/10.1093/bioinformatics/btu170
Carta, A., Bedini, G., & Peruzzi, L. (2020). A deep dive into the ancestral chromosome number and genome size of flowering plants. New Phytologist, 228, 1097-1106. https://doi.org/10.1111/nph.16668
Challis, R., Richards, E., Rajan, J., Cochrane, G., & Blaxter, M. (2020). BlobToolKit - Interactive quality assessment of genome assemblies. G3: Genes, Genomes, Genetics, 10, 1361-1374. https://doi.org/10.1534/g3.119.400908
Chueca, L., Kochmann, J., Schell, T. et al (2021). De novo genome assembly of the raccoon dog (Nyctereutes procyonoides). Frontiers in Genetics, 12, 658256. https://doi.org/10.3389/fgene
Chueca, L. J., Schell, T., & Pfenninger, M. (2021). De novo genome assembly of the land snail Candidula unifasciata (Mollusca: Gastropoda). G3: Genes, Genomes, Genetics, 11, jkab180.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2013). Applied multiple regression/correlation analysis for the behavioral sciences. Routledge.
Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., & Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10, giab008. https://doi.org/10.1093/gigascience/giab008
David, F., & Johnson, N. (1952). The truncated poisson. Biometrics, 8, 275-285. https://doi.org/10.2307/3001863
Delignette-Muller, M. L., & Dutang, C. (2015). fitdistrplus: An R package for fitting distributions. Journal of Statistical Software, 64, 1-34.
Dodsworth, S. (2015). Genome skimming for next-generation biodiversity analysis. Trends in Plant Science, 20, 525-527. https://doi.org/10.1016/j.tplants.2015.06.012
Doležel, J., & Greilhuber, J. (2010). Nuclear genome size: are we getting closer? Cytometry Part A, 77, 635-642. https://doi.org/10.1002/cyto.a.20915
Elliott, T. A., & Gregory, T. R. (2015). What's in a genome? The C-value enigma and the evolution of eukaryotic genome content. Philosophical Transactions of the Royal Society B: Biological Sciences, 370, 20140331. https://doi.org/10.1098/rstb.2014.0331
Fountain, E. D., Pauli, J. N., Reid, B. N., Palsbøll, P. J., & Peery, M. Z. (2016). Finding the right coverage: The impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates. Molecular Ecology Resources, 16, 966-978. https://doi.org/10.1111/1755-0998.12519
García-Alcalde, F., Okonechnikov, K., Carbonell, J., Cruz, L. M., Götz, S., Tarazona, S., Dopazo, J., Meyer, T. F., & Conesa, A. (2012). Qualimap: Evaluating next-generation sequencing alignment data. Bioinformatics, 28, 2678-2679. https://doi.org/10.1093/bioinformatics/bts503
Gardner, J. D., Laurin, M., & Organ, C. L. (2020). The relationship between genome size and metabolic rate in extant vertebrates. Philosophical Transactions of the Royal Society B: Biological Sciences, 375, 20190146. https://doi.org/10.1098/rstb.2019.0146
Hare, E. E., & Johnston, J. S. (2012). Genome size determination using flow cytometry of propidium iodide-stained nuclei. In V. Orogozo & M. V. Rockmann eds., Molecular methods for evolutionary genetics (pp. 3-12). Humana Press.
Hartke, J., Schell, T., Jongepier, E., Schmidt, H., Sprenger, P. P., Paule, J., Bornberg-Bauer, E., Schmitt, T., Menzel, F., Pfenninger, M., & Feldmeyer, B. (2019). Hybrid genome assembly of a neotropical mutualistic ant. Genome Biology and Evolution, 11, 2306-2311. https://doi.org/10.1093/gbe/evz159
Heckenhauer, J., Frandsen, P. B., Gupta, D. K., Paule, J., Prost, S., Schell, T., Schneider, J. V., Stewart, R. J., & Pauls, S. U. (2019). Annotated draft genomes of two caddisfly species Plectrocnemia conspersa CURTIS and Hydropsyche tenuis NAVAS (Insecta: Trichoptera). Genome Biology and Evolution, 11, 3445-3451. https://doi.org/10.1093/gbe/evz264
Heckenhauer, J., Frandsen, P. B., Sproul, J. S. et al (2021). Genome size evolution in the diverse insect order Trichoptera. bioRxiv. https://doi.org/10.1101/2021.05.10.443368
Huang, W., Li, L., Myers, J. R., & Marth, G. T. (2012). ART: A next-generation sequencing read simulator. Bioinformatics, 28, 593-594. https://doi.org/10.1093/bioinformatics/btr708
Johnston, J. S., Bernardini, A., & Hjelmen, C. E. (2019). Genome size estimation and quantitative cytogenetics in insects. In S. Brown & M. Pfrender eds., Insect genomics (pp. 15-26). Springer.
Kapusta, A., Suh, A., & Feschotte, C. (2017). Dynamics of genome size evolution in birds and mammals. Proceedings of the National Academy of Sciences of the United States of America, 114, E1460-E1469. https://doi.org/10.1073/pnas.1616702114
Keyl, H.-G. (1965). A demonstrable local and geometric increase in the chromosomal DNA of Chironomus. Experientia, 21, 191-193. https://doi.org/10.1007/BF02141878
Kumar, S., Jones, M., Koutsovoulos, G., Clarke, M., & Blaxter, M. (2013). Blobology: Exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Frontiers in Genetics, 4, 237. https://doi.org/10.3389/fgene.2013.00237
Lefébure, T., Morvan, C., Malard, F., François, C., Konecny-Dupré, L., Guéguen, L., Weiss-Gayet, M., Seguin-Orlando, A., Ermini, L., Sarkissian, C. D., Charrier, N. P., Eme, D., Mermillod-Blondin, F., Duret, L., Vieira, C., Orlando, L., & Douady, C. J. (2017). Less effective selection leads to larger genomes. Genome Research, 27, 1016-1028. https://doi.org/10.1101/gr.212589.116
Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.
Li, X., & Waterman, M. S. (2003). Estimating the repeat structure and length of DNA sequences using ℓ-tuples. Genome Research, 13, 1916-1922. https://doi.org/10.1101/gr.1251803
Lipovský, M., Vinar, T., & Brejova, B. (2017). Approximate abundance histograms and their use for genome size estimation. ITAT, 2017, 27-34.
Lynch, M., & Conery, J. S. (2003). The origins of genome complexity. Science, 302, 1401-1404. https://doi.org/10.1126/science.1089370
Marçais, G., & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27, 764-770. https://doi.org/10.1093/bioinformatics/btr011
Mishra, B., Ploch, S., Runge, F., Schmuker, A., Xia, X., Gupta, D. K., Sharma, R., & Thines, M. (2020). The genome of Microthlaspi erraticum (Brassicaceae) provides insights into the adaptation to highly calcareous soils. Frontiers in Plant Science, 11, 943. https://doi.org/10.3389/fpls.2020.00943
Mishra, B., Ulaszewski, B., Meger, J. et al (2021). A chromosome-level genome assembly of the European Beech (Fagus sylvatica) reveals anomalies for organelle DNA integration, repeat content and distribution of SNPs. bioRxiv. https://doi.org/10.1101/2021.03.22.436437
Nadarajah, S., & Kotz, S. (2006). R programs for computing truncated distributions. Journal of Statistical Software, 16, Code Snippet 2.
Nickel, J. H., Schell, T., Holtzem, T. et al (2021) Hybridization dynamics and extensive introgression in the Daphnia longispina species complex: New insights from a high-quality Daphnia galeata reference genome. bioRxiv. https://doi.org/10.1101/2021.02.01.429177
Novák, P., Guignard, M. S., Neumann, P., Kelly, L. J., Mlinarec, J., Koblížková, A., Dodsworth, S., Kovařík, A., Pellicer, J., Wang, W., Macas, J., Leitch, I. J., & Leitch, A. R. (2020). Repeat-sequence turnover shifts fundamentally in species with large genomes. Nature Plants, 6, 1325-1329. https://doi.org/10.1038/s41477-020-00785-x
Oliver, M. J., Petrov, D., Ackerly, D., Falkowski, P., & Schofield, O. M. (2007). The mode and tempo of genome size evolution in eukaryotes. Genome Research, 17, 594-601. https://doi.org/10.1101/gr.6096207
Petrov, D. A. (2001). Evolution of genome size: new approaches to an old problem. Trends in Genetics, 17, 23-28. https://doi.org/10.1016/S0168-9525(00)02157-0
Pflug, J. M., Holmes, V. R., Burrus, C. et al (2020). Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera). G3: Genes, Genomes, Genetics, 10, 3047-3060.
Poptsova, M. S., Il'Icheva, I. A., Nechipurenko, D. Y. et al (2014). Non-random DNA fragmentation in next-generation sequencing. Scientific Reports, 4, 1-6.
Prokopowich, C. D., Gregory, T. R., & Crease, T. J. (2003). The correlation between rDNA copy number and genome size in eukaryotes. Genome, 46, 48-50. https://doi.org/10.1139/g02-103
Pucker, B. (2019). Mapping-based genome size estimation. bioRxiv. https://doi.org/10.1101/607390
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841-842. https://doi.org/10.1093/bioinformatics/btq033
Schell, T., Feldmeyer, B., Schmidt, H., Greshake, B., Tills, O., Truebano, M., Rundle, S. D., Paule, J., Ebersberger, I., & Pfenninger, M. (2017). An annotated draft genome for Radix auricularia (Gastropoda, Mollusca). Genome Biology and Evolution, 9, 585-592. https://doi.org/10.1093/gbe/evx032
Sims, D., Sudbery, I., Ilott, N. E., Heger, A., & Ponting, C. P. (2014). Sequencing depth and coverage: Key considerations in genomic analyses. Nature Reviews Genetics, 15, 121-132. https://doi.org/10.1038/nrg3642
Sohn, J.-I., & Nam, J.-W. (2018). The present and future of de novo whole-genome assembly. Briefings in Bioinformatics, 19, 23-40.
Treangen, T. J., & Salzberg, S. L. (2012). Repetitive DNA and next-generation sequencing: Computational challenges and solutions. Nature Reviews Genetics, 13, 36-46. https://doi.org/10.1038/nrg3117
Vitales, D., Álvarez, I., Garcia, S., Hidalgo, O., Nieto Feliner, G., Pellicer, J., Vallès, J., & Garnatje, T. (2020). Genome size variation at constant chromosome number is not correlated with repetitive DNA dynamism in Anacyclus (Asteraceae). Annals of Botany, 125, 611-623. https://doi.org/10.1093/aob/mcz183
Vurture, G. W., Sedlazeck, F. J., Nattestad, M., Underwood, C. J., Fang, H., Gurtowski, J., & Schatz, M. C. (2017). GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics, 33, 2202-2204. https://doi.org/10.1093/bioinformatics/btx153
Wang, J., Liu, J., & Kang, M. (2015). Quantitative testing of the methodology for genome size estimation in plants using flow cytometry: A case study of the Primulina genus. Frontiers in Plant Science, 6, 354. https://doi.org/10.3389/fpls.2015.00354
Winter, S., Prost, S., de Raad, J., Coimbra, R. T. F., Wolf, M., Nebenführ, M., Held, A., Kurzawe, M., Papapostolou, R., Tessien, J., Bludau, J., Kelch, A., Gronefeld, S., Schöneberg, Y., Zeitz, C., Zapf, K., Prochotta, D., Murphy, M., Sheffer, M. M., … Janke, A. (2020). Chromosome-level genome assembly of a benthic associated Syngnathiformes species: The common dragonet, Callionymus lyra. Gigabyte, 2020, 1-10. https://doi.org/10.46471/gigabyte.6