Chromosome-scale, haplotype-resolved assembly of human genomes.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
03 2021
Historique:
received: 21 10 2019
accepted: 17 09 2020
revised: 09 09 2020
pubmed: 9 12 2020
medline: 15 4 2021
entrez: 8 12 2020
Statut: ppublish

Résumé

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.

Identifiants

pubmed: 33288905
doi: 10.1038/s41587-020-0711-0
pii: 10.1038/s41587-020-0711-0
pmc: PMC7954703
mid: NIHMS1630556
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

309-312

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Organisme : NHGRI NIH HHS
ID : RM1 HG008525
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : K99 HG010906
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008898
Pays : United States

Références

Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).
doi: 10.1038/nrg2950
Vinson, J. P. et al. Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res. 15, 1127–1135 (2005).
doi: 10.1101/gr.3722605
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
doi: 10.1038/nmeth.4035
Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).
doi: 10.1101/gr.214874.116
Garg, S. et al. A graph-based approach to diploid genome assembly. Bioinformatics 34, i105–i114 (2018).
doi: 10.1093/bioinformatics/bty279
Kronenberg, Z. N. et al. Extended haplotype phasing of de novo genome assemblies with FALCON-Phase. Preprint at bioRxiv https://doi.org/10.1101/327064 (2018).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Garg, S. et al. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 36, 2385–2392 (2019).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
doi: 10.1038/s41587-019-0217-9
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
doi: 10.1038/nbt.2727
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://doi.org/10.1101/705616 (2019).
Dudchenko, O. et al. De novo assembly of the genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
doi: 10.1126/science.aal3327
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
doi: 10.1101/gr.193474.115
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
doi: 10.1038/nbt.4235
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
doi: 10.1101/gr.213462.116
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
doi: 10.1038/nbt.2835
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
doi: 10.1038/s41587-019-0074-6
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
doi: 10.1038/s41467-018-08148-z
Porubsky, D. et al. A fully phased accurate assembly of an individual human genome. Preprint at bioRxiv https://doi.org/10.1101/855049 (2019).
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
doi: 10.1038/s41592-018-0054-7
Zook, J. M. et al. A robust benchmark for germline structural variant detection. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0538-8 (2020).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open v.4.0 (2015); http://www.repeatmasker.org
Nir, G. et al. Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling. PLoS Genet. 14, e1007872 (2018).
doi: 10.1371/journal.pgen.1007872

Auteurs

Shilpa Garg (S)

Department of Genetics, Harvard Medical School, Boston, MA, USA. shilpa_garg@hms.harvard.edu.
Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA. shilpa_garg@hms.harvard.edu.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. shilpa_garg@hms.harvard.edu.

Arkarachai Fungtammasan (A)

DNAnexus, Mountain View, CA, USA.

Andrew Carroll (A)

Google, Mountain View, CA, USA.

Mike Chou (M)

Department of Genetics, Harvard Medical School, Boston, MA, USA.

Anthony Schmitt (A)

Arima Genomics, San Diego, CA, USA.

Xiang Zhou (X)

Arima Genomics, San Diego, CA, USA.

Stephen Mac (S)

Arima Genomics, San Diego, CA, USA.

Paul Peluso (P)

Pacific Biosciences, Menlo Park, CA, USA.

Emily Hatas (E)

Pacific Biosciences, Menlo Park, CA, USA.

Jay Ghurye (J)

Dovetail Genomics, Scotts Valley, CA, USA.

Jared Maguire (J)

Dovetail Genomics, Scotts Valley, CA, USA.

Medhat Mahmoud (M)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Haoyu Cheng (H)

Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

David Heller (D)

Max Planck Institute for Molecular Genetics, Berlin, Germany.

Justin M Zook (JM)

Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.

Tobias Moemke (T)

Saarland University, Saarbrücken, Germany.

Tobias Marschall (T)

Saarland University, Saarbrücken, Germany.
Max Planck Institute for Informatics, Saarbrücken, Germany.

Fritz J Sedlazeck (FJ)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

John Aach (J)

Department of Genetics, Harvard Medical School, Boston, MA, USA.

Chen-Shan Chin (CS)

DNAnexus, Mountain View, CA, USA. jchin@dnanexus.com.

George M Church (GM)

Department of Genetics, Harvard Medical School, Boston, MA, USA. gchurch@genetics.med.harvard.edu.

Heng Li (H)

Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA. hli@ds.dfci.harvard.edu.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. hli@ds.dfci.harvard.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH