High-quality metagenome assembly from long accurate reads with metaMDBG.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
02 Jan 2024
02 Jan 2024
Historique:
received:
17
02
2023
accepted:
08
09
2023
medline:
4
1
2024
pubmed:
4
1
2024
entrez:
3
1
2024
Statut:
aheadofprint
Résumé
We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.
Identifiants
pubmed: 38168989
doi: 10.1038/s41587-023-01983-6
pii: 10.1038/s41587-023-01983-6
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : RCUK | Natural Environment Research Council (NERC)
ID : NE/T013230/1
Organisme : RCUK | Medical Research Council (MRC)
ID : MR/S037195/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/CSP1720/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BS/E/T/000PR9818
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BBS/E/T/000PR9817
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/N023285/1
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 956229
Informations de copyright
© 2024. The Author(s).
Références
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
doi: 10.1038/nbt.3935
pubmed: 28898207
The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).
doi: 10.1038/nature11209
Edgar, R. C. et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).
doi: 10.1038/s41586-021-04332-2
pubmed: 35082445
Quince, C. et al. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol. 22, 214 (2021).
doi: 10.1186/s13059-021-02419-7
pubmed: 34311761
pmcid: 8311964
Vicedomini, R., Quince, C., Darling, A. E. & Chikhi, R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat. Commun. 12, 4485 (2021).
doi: 10.1038/s41467-021-24515-9
pubmed: 34301928
pmcid: 8302730
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
doi: 10.1038/nmeth.3103
pubmed: 25218180
Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol. 38, 701–707 (2020).
doi: 10.1038/s41587-020-0422-6
pubmed: 32042169
pmcid: 7283042
Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022).
doi: 10.1038/s41592-022-01539-7
pubmed: 35789207
pmcid: 9262707
Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 10, 209 (2022).
doi: 10.1186/s40168-022-01415-8
pubmed: 36457010
pmcid: 9716684
Bickhart, D. M. et al. Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat. Biotechnol. 40, 711–719 (2022).
doi: 10.1038/s41587-021-01130-z
pubmed: 34980911
Reiter, T. E. & Brown, C. T. MAGs achieve lineage resolution. Nat. Microbiol. 7, 193–194 (2022).
doi: 10.1038/s41564-021-01027-2
pubmed: 34980920
Myers, E. W. The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005).
doi: 10.1093/bioinformatics/bti1114
pubmed: 16204131
Idury, R. M. & Waterman, M. S. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2, 291–306 (1995).
doi: 10.1089/cmb.1995.2.291
pubmed: 7497130
Feng, X., Cheng, H., Portik, D. & Li, H. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat. Methods 19, 671–674 (2022).
doi: 10.1038/s41592-022-01478-3
pubmed: 35534630
pmcid: 9343089
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
doi: 10.1038/s41587-019-0072-8
pubmed: 30936562
Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl Acad. Sci. USA 113, E8396–E8405 (2016).
doi: 10.1073/pnas.1604560113
pubmed: 27956617
pmcid: 5206522
Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).
doi: 10.1038/s41592-020-00971-x
pubmed: 33020656
Ekim, Bariş., Berger, B. & Chikhi, R. Minimizer-space de Bruijn graphs: whole-genome assembly of long reads in minutes on a personal computer. Cell Syst. 12, 958–968.e6 (2021).
pubmed: 34525345
pmcid: 8562525
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
doi: 10.1101/gr.214270.116
pubmed: 28100585
pmcid: 5411768
Hon, T. et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci. Data 7, 399 (2020).
doi: 10.1038/s41597-020-00743-4
pubmed: 33203859
pmcid: 7673114
Antipov, D., Raiko, M., Lapidus, A. & Pevzner, P. A. Metaviral spades: assembly of viruses from metagenomic data. Bioinformatics 36, 4126–4129 (2020).
doi: 10.1093/bioinformatics/btaa490
pubmed: 32413137
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
doi: 10.1038/s41587-020-00774-7
pubmed: 33349699
Williams, T. J., Allen, M. A., Panwar, P. & Cavicchioli, R. Into the darkness: the ecologies of novel 'microbial dark matter' phyla in an Antarctic lake. Environ. Microbiol. 24, 2576–2603 (2022).
doi: 10.1111/1462-2920.16026
pubmed: 35466505
pmcid: 9324843
Kadnikov, V. V., Mardanov, A. V., Beletsky, A. V., Karnachuk, O. V. & Ravin, N. V. Genome analysis of a member of the uncultured Phylum Riflebacteria revealed pathways of organotrophic metabolism and dissimilatory iron reduction. Microbiology 89, 328–336 (2020).
doi: 10.1134/S0026261720030078
Luo, X., Kang, X. & Schönhuth, A. VeChat: correcting errors in long reads using variation graphs. Nat. Commun. 13, 6657 (2022).
doi: 10.1038/s41467-022-34381-8
pubmed: 36333324
pmcid: 9636371
Holley, G. et al. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol. 22, 28 (2021).
doi: 10.1186/s13059-020-02244-4
pubmed: 33419473
pmcid: 7792008
Roberts, M., Hayes, W., Hunt, B. R., Mount, S. M. & Yorke, J. A. Reducing storage requirements for biological sequence comparison. Bioinformatics 20, 3363–3369 (2004).
doi: 10.1093/bioinformatics/bth408
pubmed: 15256412
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
doi: 10.1093/bioinformatics/bts174
pubmed: 22495754
Onodera, T., Sadakane, K. & Shibuya, T. Detecting superbubbles in assembly graphs. In Algorithms in bioinformatics: Proc. 13th International Workshop (Eds. Darling, A. & Stoye, J.) 338–348 (Springer, 2013).
Marco-Sola, S., Moure, J. C., Moreto, M. & Espinosa, A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 37, 456–463 (2021).
doi: 10.1093/bioinformatics/btaa777
pubmed: 32915952
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352
pubmed: 19505943
pmcid: 2723002
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
doi: 10.7717/peerj.7359
pubmed: 31388474
pmcid: 6662567
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
doi: 10.1093/bioinformatics/btp157
pubmed: 19307242
pmcid: 2732312
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).
doi: 10.1371/journal.pone.0009490
pubmed: 20224823
pmcid: 2835736
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GGTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
doi: 10.1093/bioinformatics/btac672
pubmed: 36218463
pmcid: 9710552
Louca, S. & Doebeli, M. Efficient comparative phylogenetics on large trees. Bioinformatics 34, 1053–1055 (2018).
doi: 10.1093/bioinformatics/btx701
pubmed: 29091997
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
doi: 10.1002/cpbi.96
pubmed: 32162851
Wang, L. G. et al. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol. Biol. Evol. 37, 599–603 (2020).
doi: 10.1093/molbev/msz240
pubmed: 31633786
Xu, S. et al. ggtreeExtra: compact visualization of richly annotated phylogenetic data. Mol. Biol. Evol. 38, 4039–4042 (2021).
doi: 10.1093/molbev/msab166
pubmed: 34097064
pmcid: 8382893
Blassel, L., Medvedev, P. & Chikhi, R. Mapping-friendly sequence reductions: going beyond homopolymer compression. iScience 25, 105305 (2022).
doi: 10.1016/j.isci.2022.105305
pubmed: 36339268
pmcid: 9633736