Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
11 2021
Historique:
received: 08 03 2021
accepted: 06 09 2021
pubmed: 3 11 2021
medline: 29 12 2021
entrez: 2 11 2021
Statut: ppublish

Résumé

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).

Identifiants

pubmed: 34725481
doi: 10.1038/s41592-021-01299-w
pii: 10.1038/s41592-021-01299-w
pmc: PMC8571015
mid: NIHMS1738709
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

1322-1332

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010262
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States

Informations de copyright

© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Nature. 2020 Sep;585(7823):79-84
pubmed: 32663838
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Nat Rev Genet. 2004 May;5(5):345-54
pubmed: 15143317
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Genome Res. 2018 Jul;28(7):1029-1038
pubmed: 29884752
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Nature. 2021 May;593(7857):101-107
pubmed: 33828295
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Nat Rev Genet. 2011 Sep 16;12(10):703-14
pubmed: 21921926
Cell Genom. 2022 May 11;2(5):
pubmed: 35720974
Nature. 2012 Nov 1;491(7422):56-65
pubmed: 23128226
Cell Genom. 2022 May;2(5):
pubmed: 36452119
Genome Med. 2014 Sep 25;6(9):73
pubmed: 25473435
Genome Res. 2017 May;27(5):677-685
pubmed: 27895111
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521
pubmed: 33346817
Genome Res. 2017 May;27(5):757-767
pubmed: 28381613
Genome Biol. 2018 Jul 13;19(1):90
pubmed: 30005597
Nat Methods. 2022 Jun;19(6):705-710
pubmed: 35365778
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Acta Neuropathol. 2017 Nov;134(5):691-703
pubmed: 28638988
Genome Biol. 2019 Jun 3;20(1):116
pubmed: 31159868
Nat Biotechnol. 2018 Apr;36(4):321-323
pubmed: 29553574
Nat Commun. 2020 Sep 22;11(1):4794
pubmed: 32963235
Front Immunol. 2020 Sep 23;11:2136
pubmed: 33072076
Nat Commun. 2019 Oct 11;10(1):4660
pubmed: 31604920
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Nat Methods. 2015 Apr;12(4):351-6
pubmed: 25686389
Semin Cell Dev Biol. 2013 Aug-Sep;24(8-9):643-52
pubmed: 23665005
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Methods. 2012 Nov;58(3):268-76
pubmed: 22652625
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Nat Biotechnol. 2020 Sep;38(9):1044-1053
pubmed: 32686750
J Comput Biol. 2014 Jun;21(6):405-19
pubmed: 24874280
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Front Genet. 2014 Nov 10;5:381
pubmed: 25426137
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Nat Rev Genet. 2011 Mar;12(3):215-23
pubmed: 21301473
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044

Auteurs

Kishwar Shafin (K)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Trevor Pesout (T)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Pi-Chuan Chang (PC)

Google Inc, Mountain View, CA, USA.

Maria Nattestad (M)

Google Inc, Mountain View, CA, USA.

Alexey Kolesnikov (A)

Google Inc, Mountain View, CA, USA.

Sidharth Goel (S)

Google Inc, Mountain View, CA, USA.

Gunjan Baid (G)

Google Inc, Mountain View, CA, USA.

Mikhail Kolmogorov (M)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jordan M Eizenga (JM)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Karen H Miga (KH)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Paolo Carnevali (P)

Chan Zuckerberg Initiative, Redwood City, CA, USA.

Miten Jain (M)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Andrew Carroll (A)

Google Inc, Mountain View, CA, USA. awcarroll@google.com.

Benedict Paten (B)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA. bpaten@ucsc.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH