Integrating Hi-C links with assembly graphs for chromosome-scale assembly.


Journal

PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922

Informations de publication

Date de publication:
08 2019
Historique:
received: 11 01 2019
accepted: 18 07 2019
revised: 03 09 2019
pubmed: 23 8 2019
medline: 21 1 2020
entrez: 22 8 2019
Statut: epublish

Résumé

Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

Identifiants

pubmed: 31433799
doi: 10.1371/journal.pcbi.1007273
pii: PCOMPBIOL-D-19-00061
pmc: PMC6719893
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1007273

Subventions

Organisme : NIAID NIH HHS
ID : R01 AI100947
Pays : United States
Organisme : NHGRI NIH HHS
ID : R44 HG009584
Pays : United States

Déclaration de conflit d'intérêts

Sergey Koren has received travel and accommodation expenses to speak at Oxford Nanopore Technologies conferences. Anthony Schmitt and Siddarth Selvaraj are employees of Arima Genomics, a company commercializing Hi-C DNA sequencing technologies.

Références

Nat Genet. 2006 Nov;38(11):1348-54
pubmed: 17033623
Nat Commun. 2014 Dec 17;5:5695
pubmed: 25517223
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
BMC Genomics. 2015 Sep 29;16:734
pubmed: 26416786
Nat Biotechnol. 2013 Feb;31(2):135-41
pubmed: 23263233
Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8
pubmed: 21187386
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Nat Biotechnol. 2013 Dec;31(12):1119-25
pubmed: 24185095
Nat Genet. 2017 Apr;49(4):643-650
pubmed: 28263316
Nat Rev Genet. 2013 Mar;14(3):157-67
pubmed: 23358380
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Science. 2009 Oct 9;326(5950):289-93
pubmed: 19815776
Bioinformatics. 2018 Mar 1;34(5):725-731
pubmed: 29069293
Nat Genet. 2018 Nov;50(11):1565-1573
pubmed: 30297971
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii79-85
pubmed: 16204131
Genome Res. 2017 May;27(5):757-767
pubmed: 28381613
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
BMC Genomics. 2017 Jul 12;18(1):527
pubmed: 28701198
Gigascience. 2019 Jun 1;8(6):
pubmed: 31157884
Genome Biol. 2016 Nov 25;17(1):239
pubmed: 27887629
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
BMC Bioinformatics. 2018 Nov 29;19(1):460
pubmed: 30497373
Science. 1993 Oct 1;262(5130):110-4
pubmed: 8211116
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Cell Syst. 2016 Jul;3(1):99-101
pubmed: 27467250
Nat Biotechnol. 2016 Mar;34(3):303-11
pubmed: 26829319
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53
pubmed: 11504945
Nat Methods. 2015 Aug;12(8):780-6
pubmed: 26121404
PLoS One. 2012;7(2):e31002
pubmed: 22319599
Nature. 2012 Apr 11;485(7398):376-80
pubmed: 22495300
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Genome Res. 2012 Mar;22(3):557-67
pubmed: 22147368
J Comput Biol. 2009 Jul;16(7):897-908
pubmed: 19580519
Nat Biotechnol. 2013 Dec;31(12):1143-7
pubmed: 24270850
Nature. 1996 May 30;381(6581):364-6
pubmed: 8632789
Genomics. 2010 Jun;95(6):315-27
pubmed: 20211242
Annu Rev Anim Biosci. 2015;3:57-111
pubmed: 25689317
Science. 2013 Nov 22;342(6161):948-53
pubmed: 24200812
Genome Res. 2016 Mar;26(3):342-50
pubmed: 26848124
Genome Res. 2012 Aug;22(8):1581-8
pubmed: 22555592
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044

Auteurs

Jay Ghurye (J)

Department of Computer Science, University of Maryland, College Park, Maryland, United States of America.
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America.

Arang Rhie (A)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America.

Brian P Walenz (BP)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America.

Anthony Schmitt (A)

Arima Genomics, San Diego, California, United States of America.

Siddarth Selvaraj (S)

Arima Genomics, San Diego, California, United States of America.

Mihai Pop (M)

Department of Computer Science, University of Maryland, College Park, Maryland, United States of America.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America.

Sergey Koren (S)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH