Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
11 2021
11 2021
Historique:
received:
08
03
2021
accepted:
06
09
2021
pubmed:
3
11
2021
medline:
29
12
2021
entrez:
2
11
2021
Statut:
ppublish
Résumé
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).
Identifiants
pubmed: 34725481
doi: 10.1038/s41592-021-01299-w
pii: 10.1038/s41592-021-01299-w
pmc: PMC8571015
mid: NIHMS1738709
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
1322-1332Subventions
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010262
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Informations de copyright
© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Nature. 2020 Sep;585(7823):79-84
pubmed: 32663838
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Nat Rev Genet. 2004 May;5(5):345-54
pubmed: 15143317
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Genome Res. 2018 Jul;28(7):1029-1038
pubmed: 29884752
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Nature. 2021 May;593(7857):101-107
pubmed: 33828295
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Nat Rev Genet. 2011 Sep 16;12(10):703-14
pubmed: 21921926
Cell Genom. 2022 May 11;2(5):
pubmed: 35720974
Nature. 2012 Nov 1;491(7422):56-65
pubmed: 23128226
Cell Genom. 2022 May;2(5):
pubmed: 36452119
Genome Med. 2014 Sep 25;6(9):73
pubmed: 25473435
Genome Res. 2017 May;27(5):677-685
pubmed: 27895111
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521
pubmed: 33346817
Genome Res. 2017 May;27(5):757-767
pubmed: 28381613
Genome Biol. 2018 Jul 13;19(1):90
pubmed: 30005597
Nat Methods. 2022 Jun;19(6):705-710
pubmed: 35365778
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Acta Neuropathol. 2017 Nov;134(5):691-703
pubmed: 28638988
Genome Biol. 2019 Jun 3;20(1):116
pubmed: 31159868
Nat Biotechnol. 2018 Apr;36(4):321-323
pubmed: 29553574
Nat Commun. 2020 Sep 22;11(1):4794
pubmed: 32963235
Front Immunol. 2020 Sep 23;11:2136
pubmed: 33072076
Nat Commun. 2019 Oct 11;10(1):4660
pubmed: 31604920
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Nat Methods. 2015 Apr;12(4):351-6
pubmed: 25686389
Semin Cell Dev Biol. 2013 Aug-Sep;24(8-9):643-52
pubmed: 23665005
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Methods. 2012 Nov;58(3):268-76
pubmed: 22652625
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Nat Biotechnol. 2020 Sep;38(9):1044-1053
pubmed: 32686750
J Comput Biol. 2014 Jun;21(6):405-19
pubmed: 24874280
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Front Genet. 2014 Nov 10;5:381
pubmed: 25426137
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Nat Rev Genet. 2011 Mar;12(3):215-23
pubmed: 21301473
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044