Terabase-scale metagenome coassembly with MetaHipMer.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
01 07 2020
01 07 2020
Historique:
received:
13
02
2020
accepted:
05
06
2020
entrez:
3
7
2020
pubmed:
3
7
2020
medline:
15
12
2020
Statut:
epublish
Résumé
Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.
Identifiants
pubmed: 32612216
doi: 10.1038/s41598-020-67416-5
pii: 10.1038/s41598-020-67416-5
pmc: PMC7329831
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
10689Références
J Math Biol. 2013 Nov;67(5):1141-61
pubmed: 22965653
PLoS One. 2010 Jul 29;5(7):e11652
pubmed: 20686599
Nucleic Acids Res. 2015 Apr 20;43(7):e46
pubmed: 25586223
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Bioinformatics. 2012 Jun 1;28(11):1420-8
pubmed: 22495754
Brief Bioinform. 2020 May 21;21(3):777-790
pubmed: 30860572
PLoS One. 2017 Jan 18;12(1):e0169662
pubmed: 28099457
Sci Rep. 2014 Oct 01;4:6480
pubmed: 25270300
Bioinformatics. 2016 Apr 1;32(7):1088-90
pubmed: 26614127
Nat Biotechnol. 2017 Aug 8;35(8):725-731
pubmed: 28787424
Sci Data. 2016 Sep 27;3:160081
pubmed: 27673566
Bioinformatics. 2014 Jun 15;30(12):i293-301
pubmed: 24931996
PeerJ. 2019 Jul 26;7:e7359
pubmed: 31388474
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
BMC Genomics. 2014 Nov 18;15:989
pubmed: 25407630
Science. 2011 Jan 28;331(6016):463-7
pubmed: 21273488
Nat Methods. 2007 Jun;4(6):495-500
pubmed: 17468765
Microbiome. 2019 Feb 8;7(1):17
pubmed: 30736849
ISME J. 2017 Dec;11(12):2864-2868
pubmed: 28742071
Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9
pubmed: 24632729
Methods. 2016 Jun 1;102:3-11
pubmed: 27012178
mSystems. 2018 Apr 10;3(3):
pubmed: 29657970
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
Nucleic Acids Res. 2018 Jan 4;46(D1):D692-D699
pubmed: 29106641