High-quality metagenome assembly from long accurate reads with metaMDBG.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
02 Jan 2024
Historique:
received: 17 02 2023
accepted: 08 09 2023
medline: 4 1 2024
pubmed: 4 1 2024
entrez: 3 1 2024
Statut: aheadofprint

Résumé

We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.

Identifiants

pubmed: 38168989
doi: 10.1038/s41587-023-01983-6
pii: 10.1038/s41587-023-01983-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : RCUK | Natural Environment Research Council (NERC)
ID : NE/T013230/1
Organisme : RCUK | Medical Research Council (MRC)
ID : MR/S037195/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/CSP1720/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BS/E/T/000PR9818
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BBS/E/T/000PR9817
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/N023285/1
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 956229

Informations de copyright

© 2024. The Author(s).

Références

Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
doi: 10.1038/nbt.3935 pubmed: 28898207
The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).
doi: 10.1038/nature11209
Edgar, R. C. et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).
doi: 10.1038/s41586-021-04332-2 pubmed: 35082445
Quince, C. et al. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol. 22, 214 (2021).
doi: 10.1186/s13059-021-02419-7 pubmed: 34311761 pmcid: 8311964
Vicedomini, R., Quince, C., Darling, A. E. & Chikhi, R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat. Commun. 12, 4485 (2021).
doi: 10.1038/s41467-021-24515-9 pubmed: 34301928 pmcid: 8302730
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
doi: 10.1038/nmeth.3103 pubmed: 25218180
Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol. 38, 701–707 (2020).
doi: 10.1038/s41587-020-0422-6 pubmed: 32042169 pmcid: 7283042
Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022).
doi: 10.1038/s41592-022-01539-7 pubmed: 35789207 pmcid: 9262707
Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 10, 209 (2022).
doi: 10.1186/s40168-022-01415-8 pubmed: 36457010 pmcid: 9716684
Bickhart, D. M. et al. Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat. Biotechnol. 40, 711–719 (2022).
doi: 10.1038/s41587-021-01130-z pubmed: 34980911
Reiter, T. E. & Brown, C. T. MAGs achieve lineage resolution. Nat. Microbiol. 7, 193–194 (2022).
doi: 10.1038/s41564-021-01027-2 pubmed: 34980920
Myers, E. W. The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005).
doi: 10.1093/bioinformatics/bti1114 pubmed: 16204131
Idury, R. M. & Waterman, M. S. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2, 291–306 (1995).
doi: 10.1089/cmb.1995.2.291 pubmed: 7497130
Feng, X., Cheng, H., Portik, D. & Li, H. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat. Methods 19, 671–674 (2022).
doi: 10.1038/s41592-022-01478-3 pubmed: 35534630 pmcid: 9343089
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
doi: 10.1038/s41587-019-0072-8 pubmed: 30936562
Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl Acad. Sci. USA 113, E8396–E8405 (2016).
doi: 10.1073/pnas.1604560113 pubmed: 27956617 pmcid: 5206522
Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).
doi: 10.1038/s41592-020-00971-x pubmed: 33020656
Ekim, Bariş., Berger, B. & Chikhi, R. Minimizer-space de Bruijn graphs: whole-genome assembly of long reads in minutes on a personal computer. Cell Syst. 12, 958–968.e6 (2021).
pubmed: 34525345 pmcid: 8562525
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
doi: 10.1101/gr.214270.116 pubmed: 28100585 pmcid: 5411768
Hon, T. et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci. Data 7, 399 (2020).
doi: 10.1038/s41597-020-00743-4 pubmed: 33203859 pmcid: 7673114
Antipov, D., Raiko, M., Lapidus, A. & Pevzner, P. A. Metaviral spades: assembly of viruses from metagenomic data. Bioinformatics 36, 4126–4129 (2020).
doi: 10.1093/bioinformatics/btaa490 pubmed: 32413137
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
doi: 10.1038/s41587-020-00774-7 pubmed: 33349699
Williams, T. J., Allen, M. A., Panwar, P. & Cavicchioli, R. Into the darkness: the ecologies of novel 'microbial dark matter' phyla in an Antarctic lake. Environ. Microbiol. 24, 2576–2603 (2022).
doi: 10.1111/1462-2920.16026 pubmed: 35466505 pmcid: 9324843
Kadnikov, V. V., Mardanov, A. V., Beletsky, A. V., Karnachuk, O. V. & Ravin, N. V. Genome analysis of a member of the uncultured Phylum Riflebacteria revealed pathways of organotrophic metabolism and dissimilatory iron reduction. Microbiology 89, 328–336 (2020).
doi: 10.1134/S0026261720030078
Luo, X., Kang, X. & Schönhuth, A. VeChat: correcting errors in long reads using variation graphs. Nat. Commun. 13, 6657 (2022).
doi: 10.1038/s41467-022-34381-8 pubmed: 36333324 pmcid: 9636371
Holley, G. et al. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol. 22, 28 (2021).
doi: 10.1186/s13059-020-02244-4 pubmed: 33419473 pmcid: 7792008
Roberts, M., Hayes, W., Hunt, B. R., Mount, S. M. & Yorke, J. A. Reducing storage requirements for biological sequence comparison. Bioinformatics 20, 3363–3369 (2004).
doi: 10.1093/bioinformatics/bth408 pubmed: 15256412
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
doi: 10.1093/bioinformatics/bts174 pubmed: 22495754
Onodera, T., Sadakane, K. & Shibuya, T. Detecting superbubbles in assembly graphs. In Algorithms in bioinformatics: Proc. 13th International Workshop (Eds. Darling, A. & Stoye, J.) 338–348 (Springer, 2013).
Marco-Sola, S., Moure, J. C., Moreto, M. & Espinosa, A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 37, 456–463 (2021).
doi: 10.1093/bioinformatics/btaa777 pubmed: 32915952
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
doi: 10.7717/peerj.7359 pubmed: 31388474 pmcid: 6662567
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
doi: 10.1093/bioinformatics/btp157 pubmed: 19307242 pmcid: 2732312
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).
doi: 10.1371/journal.pone.0009490 pubmed: 20224823 pmcid: 2835736
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GGTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
doi: 10.1093/bioinformatics/btac672 pubmed: 36218463 pmcid: 9710552
Louca, S. & Doebeli, M. Efficient comparative phylogenetics on large trees. Bioinformatics 34, 1053–1055 (2018).
doi: 10.1093/bioinformatics/btx701 pubmed: 29091997
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
doi: 10.1002/cpbi.96 pubmed: 32162851
Wang, L. G. et al. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol. Biol. Evol. 37, 599–603 (2020).
doi: 10.1093/molbev/msz240 pubmed: 31633786
Xu, S. et al. ggtreeExtra: compact visualization of richly annotated phylogenetic data. Mol. Biol. Evol. 38, 4039–4042 (2021).
doi: 10.1093/molbev/msab166 pubmed: 34097064 pmcid: 8382893
Blassel, L., Medvedev, P. & Chikhi, R. Mapping-friendly sequence reductions: going beyond homopolymer compression. iScience 25, 105305 (2022).
doi: 10.1016/j.isci.2022.105305 pubmed: 36339268 pmcid: 9633736

Auteurs

Gaëtan Benoit (G)

Organisms and Ecosystems, Earlham Institute, Norwich, UK.

Sébastien Raguideau (S)

Organisms and Ecosystems, Earlham Institute, Norwich, UK.

Robert James (R)

Gut Microbes and Health, Quadram Institute, Norwich, UK.

Adam M Phillippy (AM)

Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA.

Rayan Chikhi (R)

Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.

Christopher Quince (C)

Organisms and Ecosystems, Earlham Institute, Norwich, UK. christopher.quince@earlham.ac.uk.
Gut Microbes and Health, Quadram Institute, Norwich, UK. christopher.quince@earlham.ac.uk.
School of Biological Sciences, University of East Anglia, Norwich, UK. christopher.quince@earlham.ac.uk.
Warwick Medical School, University of Warwick, Coventry, UK. christopher.quince@earlham.ac.uk.

Classifications MeSH