Terabase-scale metagenome coassembly with MetaHipMer.

Algorithms Computational Biology / methods Computers Genome, Bacterial / genetics Metagenome / genetics Metagenomics / methods Microbiota / genetics Pseudoalteromonas / genetics Sequence Analysis, DNA / methods

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
01 07 2020

Historique:

received: 13 02 2020

accepted: 05 06 2020

entrez: 3 7 2020

pubmed: 3 7 2020

medline: 15 12 2020

Statut: epublish

Résumé

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Identifiants

DOI: 10.1038/s41598-020-67416-5 PMID: 32612216 PMC: PMC7329831

pubmed: 32612216

doi: 10.1038/s41598-020-67416-5

pii: 10.1038/s41598-020-67416-5

pmc: PMC7329831

doi:

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Pagination

10689

Références

J Math Biol. 2013 Nov;67(5):1141-61

pubmed: 22965653

PLoS One. 2010 Jul 29;5(7):e11652

pubmed: 20686599

Nucleic Acids Res. 2015 Apr 20;43(7):e46

pubmed: 25586223

Genome Res. 2017 May;27(5):824-834

pubmed: 28298430

Bioinformatics. 2012 Jun 1;28(11):1420-8

pubmed: 22495754

Brief Bioinform. 2020 May 21;21(3):777-790

pubmed: 30860572

PLoS One. 2017 Jan 18;12(1):e0169662

pubmed: 28099457

Sci Rep. 2014 Oct 01;4:6480

pubmed: 25270300

Bioinformatics. 2016 Apr 1;32(7):1088-90

pubmed: 26614127

Nat Biotechnol. 2017 Aug 8;35(8):725-731

pubmed: 28787424

Sci Data. 2016 Sep 27;3:160081

pubmed: 27673566

Bioinformatics. 2014 Jun 15;30(12):i293-301

pubmed: 24931996

PeerJ. 2019 Jul 26;7:e7359

pubmed: 31388474

Nat Methods. 2017 Nov;14(11):1063-1071

pubmed: 28967888

BMC Genomics. 2014 Nov 18;15:989

pubmed: 25407630

Science. 2011 Jan 28;331(6016):463-7

pubmed: 21273488

Nat Methods. 2007 Jun;4(6):495-500

pubmed: 17468765

Microbiome. 2019 Feb 8;7(1):17

pubmed: 30736849

ISME J. 2017 Dec;11(12):2864-2868

pubmed: 28742071

Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9

pubmed: 24632729

Methods. 2016 Jun 1;102:3-11

pubmed: 27012178

mSystems. 2018 Apr 10;3(3):

pubmed: 29657970

Genome Res. 2015 Jul;25(7):1043-55

pubmed: 25977477

Nucleic Acids Res. 2018 Jan 4;46(D1):D692-D699

pubmed: 29106641

Terabase-scale metagenome coassembly with MetaHipMer.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Steven Hofmeyr (S)

Rob Egan (R)

Evangelos Georganas (E)

Alex C Copeland (AC)

Robert Riley (R)

Alicia Clum (A)

Emiley Eloe-Fadrosh (E)

Simon Roux (S)

Eugene Goltsman (E)

Aydın Buluç (A)

Daniel Rokhsar (D)

Leonid Oliker (L)

Katherine Yelick (K)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Planting density effect on poplar growth traits and soil nutrient availability, and response of microbial community, assembly and function.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Classifications MeSH