Local read haplotagging enables accurate long-read small variant calling.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
13 Jul 2024
13 Jul 2024
Historique:
received:
20
09
2023
accepted:
28
06
2024
medline:
14
7
2024
pubmed:
14
7
2024
entrez:
13
7
2024
Statut:
epublish
Résumé
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
Identifiants
pubmed: 39003259
doi: 10.1038/s41467-024-50079-5
pii: 10.1038/s41467-024-50079-5
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
5907Informations de copyright
© 2024. The Author(s).
Références
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
pubmed: 32504078
pmcid: 7877196
doi: 10.1038/s41576-020-0236-x
Olson, N. D. et al. Precisionfda truth challenge v2: Calling variants from short and long reads in difficult-to-map regions. Cell Genomics 2, 100129 (2022).
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
pubmed: 30936564
pmcid: 6500473
doi: 10.1038/s41587-019-0074-6
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
pubmed: 37059810
doi: 10.1038/s41576-023-00590-0
Li, W. & Freudenberg, J. Mappability and read length. Front. Genet. 5, 381 (2014).
pubmed: 25426137
pmcid: 4226227
doi: 10.3389/fgene.2014.00381
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. methods 18, 170–175 (2021).
pubmed: 33526886
pmcid: 7961889
doi: 10.1038/s41592-020-01056-5
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
pubmed: 36797493
pmcid: 10427740
doi: 10.1038/s41587-023-01662-6
Shafin, K. et al. Nanopore sequencing and the shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).
pubmed: 32686750
pmcid: 7483855
doi: 10.1038/s41587-020-0503-6
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
pubmed: 35132260
pmcid: 9117392
doi: 10.1038/s41587-021-01158-1
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
pubmed: 35357919
pmcid: 9186530
doi: 10.1126/science.abj6987
Jarvis, E. D. et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611, 519–531 (2022).
pubmed: 36261518
pmcid: 9668749
doi: 10.1038/s41586-022-05325-5
Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
pubmed: 37165242
pmcid: 10172123
doi: 10.1038/s41586-023-05896-x
De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
pubmed: 34050336
pmcid: 8161719
doi: 10.1038/s41576-021-00367-3
Shafin, K. et al. Haplotype-aware variant calling with pepper-margin-deepvariant enables high accuracy in nanopore long-reads. Nat. methods 18, 1322–1332 (2021).
pubmed: 34725481
pmcid: 8571015
doi: 10.1038/s41592-021-01299-w
Chaisson, M. J. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
pubmed: 30992455
pmcid: 6467913
doi: 10.1038/s41467-018-08148-z
Mc Cartney, A. M. et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. methods 19, 687–695 (2022).
doi: 10.1038/s41592-022-01440-3
Rang, F. J., Kloosterman, W. P. & de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19, 90 (2018).
pubmed: 30005597
pmcid: 6045860
doi: 10.1186/s13059-018-1462-9
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
pubmed: 29431738
pmcid: 5889714
doi: 10.1038/nbt.4060
Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Computational Sci. 2, 797–803 (2022).
doi: 10.1038/s43588-022-00387-x
Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 4660 (2019).
pubmed: 31604920
pmcid: 6788989
doi: 10.1038/s41467-019-12493-y
Gorzynski, J. E. et al. Ultrarapid nanopore genome sequencing in a critical care setting. N. Engl. J. Med. 386, 700–702 (2022).
pubmed: 35020984
doi: 10.1056/NEJMc2112090
Goenka, S. D. et al. Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing. Nat. Biotechnol. 40, 1035–1041 (2022).
pubmed: 35347328
pmcid: 9287171
doi: 10.1038/s41587-022-01221-5
Galey, M. et al. 3-hour genome sequencing and targeted analysis to rapidly assess genetic risk. medRxiv 2, 101833 (2022).
Cohen, A. S. et al. Genomic answers for children: Dynamic analyses of¿ 1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
pubmed: 35305867
doi: 10.1016/j.gim.2022.02.007
Kucuk, E. et al. Comprehensive de novo mutation discovery with hifi long-read sequencing. Genome Med. 15, 1–15 (2023).
doi: 10.1186/s13073-023-01183-6
Gomes, B. & Ashley, E. A. Artificial intelligence in molecular medicine. N. Engl. J. Med. 388, 2456–2465 (2023).
pubmed: 37379136
doi: 10.1056/NEJMra2204787
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
pubmed: 31406327
pmcid: 6776680
doi: 10.1038/s41587-019-0217-9
Poplin, R. et al. A universal snp and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
pubmed: 30247488
doi: 10.1038/nbt.4235
Patterson, M. et al. Whatshap: weighted haplotype assembly for future-generation sequencing reads. J. Computational Biol. 22, 498–509 (2015).
doi: 10.1089/cmb.2014.0157
Medaka, https://github.com/nanoporetech/medaka .
Ebler, J., Haukness, M., Pesout, T., Marschall, T. & Paten, B. Haplotype-aware diplotyping from noisy long reads. Genome Biol. 20, 1–16 (2019).
doi: 10.1186/s13059-019-1709-0
Lin, J.-H., Chen, L.-C., Yu, S.-C. & Huang, Y.-T. Longphase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 38, 1816–1822 (2022).
pubmed: 35104333
doi: 10.1093/bioinformatics/btac058
Hu, T., Chitnis, N., Monos, D. & Dinh, A. Next-generation sequencing technologies: an overview. Hum. Immunol. 82, 801–811 (2021).
pubmed: 33745759
doi: 10.1016/j.humimm.2021.02.012
Mantere, T., Kersten, S. & Hoischen, A. Long-read sequencing emerging in medical genetics. Front. Genet. 10, 426 (2019).
pubmed: 31134132
pmcid: 6514244
doi: 10.3389/fgene.2019.00426
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 1–16 (2020).
doi: 10.1186/s13059-020-1935-5
Pacific Biosciences of California. Revio. https://www.pacb.com/press_releases/pacbio-announces-revio-a-revolutionary-new-long-read-sequencing-system-designed-to-provide-15-times-more-hifi-data-and-human-genomes-at-scale-for-under-1000/ .
Baid, G. et al. Deepconsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat. Biotechnol. 41, 232–238 (2023).
pubmed: 36050551
Manuel, J. G. et al. High coverage highly accurate long-read sequencing of a mouse neuronal cell line using the pacbio revio sequencer. Preprint at bioRxiv https://doi.org/10.1101/2023.06.06.543940 (2023).
Mahmoud, M. et al. Utility of long-read sequencing for All of Us. Nat. Commun. 15, 837 (2024).
Harvey, W. T. et al. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res. 33, 2029–2040 (2023).
Sereika, M. et al. Oxford nanopore r10. 4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. methods 19, 823–826 (2022).
pubmed: 35789207
pmcid: 9262707
doi: 10.1038/s41592-022-01539-7
Ni, Y., Liu, X., Simeneh, Z. M., Yang, M. & Li, R. Benchmarking of nanopore r10. 4 and r9. 4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Computational Struct. Biotechnol. J. 21, 2352–2364 (2023).
doi: 10.1016/j.csbj.2023.03.038
Ahsan, M. U., Liu, Q., Fang, L. & Wang, K. Nanocaller for accurate detection of snps and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol. 22, 1–33 (2021).
doi: 10.1186/s13059-021-02472-2
Manickam, K. et al. Exome and genome sequencing for pediatric patients with congenital anomalies or intellectual disability: an evidence-based clinical guideline of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 2029–2037 (2021).
pubmed: 34211152
doi: 10.1038/s41436-021-01242-6
Miller, D. E. et al. Targeted long-read sequencing identifies a retrotransposon insertion as a cause of altered gnas exon a/b methylation in a family with autosomal dominant pseudohypoparathyroidism type 1b (php1b). J. Bone Miner. Res. 37, 1711–1719 (2022).
pubmed: 35811283
doi: 10.1002/jbmr.4647
McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20, 1297–1303 (2010).
pubmed: 20644199
pmcid: 2928508
doi: 10.1101/gr.107524.110
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at arXiv https://arxiv.org/abs/1207.3907 (2012).
Ewing, A. D. et al. Nanopore sequencing enables comprehensive transposable element epigenomic profiling. Mol. Cell 80, 915–928 (2020).
pubmed: 33186547
doi: 10.1016/j.molcel.2020.10.024
Rhoads, A. & Au, K. F. Pacbio sequencing and its applications. Genomics, Proteom. Bioinforma. 13, 278–289 (2015).
doi: 10.1016/j.gpb.2015.08.002
Liu, Y. et al. Dna methylation-calling tools for oxford nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 1–33 (2021).
doi: 10.1186/s13059-021-02510-z
Chen, J. et al. Whole-genome long-read taps deciphers DNA methylation patterns at base resolution using pacbio smrt sequencing technology. Nucleic Acids Res. 50, e104 (2022).
pubmed: 35849350
pmcid: 9561279
doi: 10.1093/nar/gkac612
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Li, H. et al. The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
pubmed: 30858580
pmcid: 6699627
doi: 10.1038/s41587-019-0054-x
Liu, D. et al. Best: A tool for characterizing sequencing errors. Preprint at bioRxiv https://doi.org/10.1101/2022.12.22.521488 (2022).