Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Algorithms Base Sequence Benchmarking Contig Mapping / methods Datasets as Topic Escherichia coli / genetics Genome High-Throughput Nucleotide Sequencing Humans Sequence Analysis, DNA / statistics & numerical data Software Staphylococcus aureus / genetics

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
16 10 2019

Historique:

received: 22 03 2019

accepted: 16 09 2019

entrez: 18 10 2019

pubmed: 18 10 2019

medline: 3 11 2020

Statut: epublish

Résumé

Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to "patch" a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).

Identifiants

DOI: 10.1038/s41598-019-51284-9 PMID: 31619717 PMC: PMC6795807

pubmed: 31619717

doi: 10.1038/s41598-019-51284-9

pii: 10.1038/s41598-019-51284-9

pmc: PMC6795807

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Pagination

14882

Subventions

Organisme : U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)

ID : 1R01AI123037

Pays : International

Références

PLoS Biol. 2015 Jul 07;13(7):e1002195

pubmed: 26151137

Bioinformatics. 2012 Jun 1;28(11):1420-8

pubmed: 22495754

Nat Biotechnol. 2011 Nov 08;29(11):987-91

pubmed: 22068540

Genome Res. 2009 Jun;19(6):1117-23

pubmed: 19251739

J Comput Biol. 2010 Nov;17(11):1519-33

pubmed: 20958248

BMC Bioinformatics. 2011 Aug 25;12:354

pubmed: 21867511

Nat Methods. 2017 Nov;14(11):1063-1071

pubmed: 28967888

Brief Bioinform. 2019 Jul 19;20(4):1151-1159

pubmed: 29028869

Genome Res. 2012 Mar;22(3):549-56

pubmed: 22156294

Gigascience. 2012 Dec 27;1(1):18

pubmed: 23587118

Brief Bioinform. 2019 Jan 18;20(1):235-244

pubmed: 28968781

Genome Res. 2008 May;18(5):821-9

pubmed: 18349386

J Comput Biol. 2012 May;19(5):455-77

pubmed: 22506599

Bioinformatics. 2011 Jul 1;27(13):i94-101

pubmed: 21685107

Nucleic Acids Res. 2012 Nov 1;40(20):e155

pubmed: 22821567

Bioinformatics. 2013 Apr 15;29(8):1072-5

pubmed: 23422339

Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8

pubmed: 21187386

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Kanak Mahadik (K)

Christopher Wright (C)

Milind Kulkarni (M)

Saurabh Bagchi (S)

Somali Chaterji (S)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Classifications MeSH