Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
10 2019
Historique:
received: 23 01 2019
accepted: 08 07 2019
pubmed: 14 8 2019
medline: 7 11 2019
entrez: 14 8 2019
Statut: ppublish

Résumé

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Identifiants

pubmed: 31406327
doi: 10.1038/s41587-019-0217-9
pii: 10.1038/s41587-019-0217-9
pmc: PMC6776680
mid: NIHMS1533949
doi:

Substances chimiques

DNA, Circular 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

1155-1162

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States

Références

Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Nat Commun. 2017 Nov 6;8(1):1326
pubmed: 29109544
Nature. 2008 Nov 6;456(7218):53-9
pubmed: 18987734
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
Nat Methods. 2013 Jun;10(6):563-9
pubmed: 23644548
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Anal Biochem. 1996 Nov 1;242(1):84-9
pubmed: 8923969
Nature. 2000 Dec 14;408(6814):796-815
pubmed: 11130711
Genome Res. 2009 Sep;19(9):1527-41
pubmed: 19546169
Methods Mol Biol. 2018;1802:135-153
pubmed: 29858806
Nat Commun. 2019 Mar 1;10(1):998
pubmed: 30824707
Genome Res. 2013 Jan;23(1):121-8
pubmed: 23064752
BMC Genomics. 2018 Mar 27;19(1):219
pubmed: 29580219
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Genet Med. 2016 Dec;18(12):1282-1289
pubmed: 27228465
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Nature. 2016 Oct 13;538(7624):243-247
pubmed: 27706134
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Nucleic Acids Res. 2010 Aug;38(15):e159
pubmed: 20571086
Hum Immunol. 2016 Mar;77(3):233-237
pubmed: 26826444
Nature. 1986 Jun 12-18;321(6071):674-9
pubmed: 3713851
Nature. 2015 Jan 29;517(7536):608-11
pubmed: 25383537
Bioinformatics. 2018 Sep 1;34(17):i748-i756
pubmed: 30423094
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Nat Commun. 2017 Jan 24;8:14061
pubmed: 28117401
Science. 2001 Feb 16;291(5507):1304-51
pubmed: 11181995
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7
pubmed: 271968
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Mol Ecol Resour. 2014 Nov;14(6):1097-102
pubmed: 25187008
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Science. 2010 Jan 1;327(5961):78-81
pubmed: 19892942
Curr Protoc Bioinformatics. 2014 Sep 08;47:11.12.1-34
pubmed: 25199790
Nat Rev Genet. 2018 Jun;19(6):329-346
pubmed: 29599501
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Science. 2005 Sep 9;309(5741):1728-32
pubmed: 16081699
Nature. 2011 Jul 20;475(7356):348-52
pubmed: 21776081
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Nature. 2002 Dec 5;420(6915):520-62
pubmed: 12466850
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044

Auteurs

Aaron M Wenger (AM)

Pacific Biosciences, Menlo Park, CA, USA.

Paul Peluso (P)

Pacific Biosciences, Menlo Park, CA, USA.

William J Rowell (WJ)

Pacific Biosciences, Menlo Park, CA, USA.

Pi-Chuan Chang (PC)

Google Inc., Mountain View, CA, USA.

Richard J Hall (RJ)

Pacific Biosciences, Menlo Park, CA, USA.

Gregory T Concepcion (GT)

Pacific Biosciences, Menlo Park, CA, USA.

Jana Ebler (J)

Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
Max Planck Institute for Informatics, Saarbrücken, Germany.
Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.

Arkarachai Fungtammasan (A)

DNAnexus, Mountain View, CA, USA.

Alexey Kolesnikov (A)

Google Inc., Mountain View, CA, USA.

Nathan D Olson (ND)

National Institute of Standards and Technology, Gaithersburg, MD, USA.

Armin Töpfer (A)

Pacific Biosciences, Menlo Park, CA, USA.

Michael Alonge (M)

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.

Medhat Mahmoud (M)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Yufeng Qian (Y)

Pacific Biosciences, Menlo Park, CA, USA.

Chen-Shan Chin (CS)

DNAnexus, Mountain View, CA, USA.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.

Michael C Schatz (MC)

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.

Gene Myers (G)

Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.

Mark A DePristo (MA)

Google Inc., Mountain View, CA, USA.

Jue Ruan (J)

Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China.

Tobias Marschall (T)

Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
Max Planck Institute for Informatics, Saarbrücken, Germany.

Fritz J Sedlazeck (FJ)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Justin M Zook (JM)

National Institute of Standards and Technology, Gaithersburg, MD, USA.

Heng Li (H)

Dana-Farber Cancer Institute, Boston, MA, USA.

Sergey Koren (S)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.

Andrew Carroll (A)

Google Inc., Mountain View, CA, USA.

David R Rank (DR)

Pacific Biosciences, Menlo Park, CA, USA. drank@pacb.com.

Michael W Hunkapiller (MW)

Pacific Biosciences, Menlo Park, CA, USA. mhunkapiller@pacb.com.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH