Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
05 Apr 2023
Historique:
pubmed: 31 1 2023
medline: 31 1 2023
entrez: 30 1 2023
Statut: epublish

Résumé

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.

Identifiants

pubmed: 36711673
doi: 10.1101/2023.01.12.523790
pmc: PMC9882142
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NIA NIH HHS
ID : P01 AG000538
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG072980
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL142481
Pays : United States
Organisme : NIH HHS
ID : OT2 OD033761
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG011274
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010262
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG011853
Pays : United States
Organisme : Intramural NIH HHS
ID : ZIA NS003154
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NINDS NIH HHS
ID : U24 NS072026
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG019610
Pays : United States
Organisme : Intramural NIH HHS
ID : ZIA AG000538
Pays : United States

Commentaires et corrections

Type : UpdateIn

Références

Bioinformatics. 2018 Jul 1;34(13):i142-i150
pubmed: 29949969
N Engl J Med. 2021 Nov 11;385(20):1868-1880
pubmed: 34758253
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Nat Methods. 2021 Nov;18(11):1322-1332
pubmed: 34725481
Nat Biotechnol. 2020 Sep;38(9):1044-1053
pubmed: 32686750
Cell Genom. 2022 Jan 12;2(1):
pubmed: 35199087
Cell. 2018 Apr 5;173(2):355-370.e14
pubmed: 29625052
Genome Biol. 2022 Dec 27;23(1):271
pubmed: 36575487
Nat Biotechnol. 2022 Sep;40(9):1332-1335
pubmed: 35332338
Gigascience. 2020 Dec 21;9(12):
pubmed: 33347570
Nat Genet. 2016 Nov;48(11):1443-1448
pubmed: 27694958
Nat Rev Genet. 2020 Oct;21(10):597-614
pubmed: 32504078
Cell Genom. 2022 May 11;2(5):
pubmed: 35720974
Nature. 2012 Nov 1;491(7422):56-65
pubmed: 23128226
Cell Genom. 2022 May;2(5):
pubmed: 36452119
Nat Methods. 2023 Mar;20(3):408-417
pubmed: 36658279
Nature. 2022 Nov;611(7936):519-531
pubmed: 36261518
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521
pubmed: 33346817
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Nature. 2020 Feb;578(7793):82-93
pubmed: 32025007
Bioinformatics. 2022 Mar 28;38(7):1816-1822
pubmed: 35104333
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Nat Methods. 2022 Apr;19(4):445-448
pubmed: 35396485
Nature. 2021 Apr;592(7856):737-746
pubmed: 33911273
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
PLoS Genet. 2010 May 13;6(5):e1000952
pubmed: 20485568
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Biol. 2020 Aug 3;21(1):189
pubmed: 32746918
Genome Biol. 2019 Nov 20;20(1):246
pubmed: 31747936
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Nat Rev Genet. 2018 Jun;19(6):329-346
pubmed: 29599501
Nat Biotechnol. 2022 May;40(5):672-680
pubmed: 35132260
Nat Biotechnol. 2023 Feb 16;:
pubmed: 36797493
Nat Genet. 2022 Apr;54(4):518-525
pubmed: 35410384
Science. 2021 Dec 17;374(6574):abg8871
pubmed: 34914532
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2012 Aug 15;28(16):2097-105
pubmed: 22668792
Genome Biol. 2021 Sep 14;22(1):268
pubmed: 34521442
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405
pubmed: 27956617

Auteurs

Mikhail Kolmogorov (M)

Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA.

Kimberley J Billingsley (KJ)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.

Mira Mastoras (M)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Melissa Meredith (M)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jean Monlong (J)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Ryan Lorig-Roach (R)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Mobin Asri (M)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Pilar Alvarez Jerez (PA)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Laksh Malik (L)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Ramita Dewan (R)

Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.

Xylena Reed (X)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Rylee M Genner (RM)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Kensuke Daida (K)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.

Sairam Behera (S)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Kishwar Shafin (K)

Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.

Trevor Pesout (T)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jeshuwin Prabakaran (J)

Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA.
Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA.

Paolo Carnevali (P)

Chan Zuckerberg Initiative, Redwood City, CA, USA.

Jianzhi Yang (J)

Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA.

Arang Rhie (A)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Sonja W Scholz (SW)

Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA.
Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA.

Bryan J Traynor (BJ)

Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.

Karen H Miga (KH)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Miten Jain (M)

Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA.

Winston Timp (W)

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Mark Chaisson (M)

Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA.

Fritz J Sedlazeck (FJ)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
Department of Computer Science, Rice University, Houston, Texas, USA.

Cornelis Blauwendraat (C)

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.

Benedict Paten (B)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Classifications MeSH