Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
10 2019
10 2019
Historique:
received:
23
01
2019
accepted:
08
07
2019
pubmed:
14
8
2019
medline:
7
11
2019
entrez:
14
8
2019
Statut:
ppublish
Résumé
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Identifiants
pubmed: 31406327
doi: 10.1038/s41587-019-0217-9
pii: 10.1038/s41587-019-0217-9
pmc: PMC6776680
mid: NIHMS1533949
doi:
Substances chimiques
DNA, Circular
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
1155-1162Subventions
Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Références
Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Nat Commun. 2017 Nov 6;8(1):1326
pubmed: 29109544
Nature. 2008 Nov 6;456(7218):53-9
pubmed: 18987734
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
Nat Methods. 2013 Jun;10(6):563-9
pubmed: 23644548
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Anal Biochem. 1996 Nov 1;242(1):84-9
pubmed: 8923969
Nature. 2000 Dec 14;408(6814):796-815
pubmed: 11130711
Genome Res. 2009 Sep;19(9):1527-41
pubmed: 19546169
Methods Mol Biol. 2018;1802:135-153
pubmed: 29858806
Nat Commun. 2019 Mar 1;10(1):998
pubmed: 30824707
Genome Res. 2013 Jan;23(1):121-8
pubmed: 23064752
BMC Genomics. 2018 Mar 27;19(1):219
pubmed: 29580219
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Genet Med. 2016 Dec;18(12):1282-1289
pubmed: 27228465
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Nature. 2016 Oct 13;538(7624):243-247
pubmed: 27706134
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Nucleic Acids Res. 2010 Aug;38(15):e159
pubmed: 20571086
Hum Immunol. 2016 Mar;77(3):233-237
pubmed: 26826444
Nature. 1986 Jun 12-18;321(6071):674-9
pubmed: 3713851
Nature. 2015 Jan 29;517(7536):608-11
pubmed: 25383537
Bioinformatics. 2018 Sep 1;34(17):i748-i756
pubmed: 30423094
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Nat Commun. 2017 Jan 24;8:14061
pubmed: 28117401
Science. 2001 Feb 16;291(5507):1304-51
pubmed: 11181995
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7
pubmed: 271968
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Mol Ecol Resour. 2014 Nov;14(6):1097-102
pubmed: 25187008
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Science. 2010 Jan 1;327(5961):78-81
pubmed: 19892942
Curr Protoc Bioinformatics. 2014 Sep 08;47:11.12.1-34
pubmed: 25199790
Nat Rev Genet. 2018 Jun;19(6):329-346
pubmed: 29599501
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Science. 2005 Sep 9;309(5741):1728-32
pubmed: 16081699
Nature. 2011 Jul 20;475(7356):348-52
pubmed: 21776081
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Nature. 2002 Dec 5;420(6915):520-62
pubmed: 12466850
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044