Chromosome-scale, haplotype-resolved assembly of human genomes.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
03 2021
03 2021
Historique:
received:
21
10
2019
accepted:
17
09
2020
revised:
09
09
2020
pubmed:
9
12
2020
medline:
15
4
2021
entrez:
8
12
2020
Statut:
ppublish
Résumé
Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.
Identifiants
pubmed: 33288905
doi: 10.1038/s41587-020-0711-0
pii: 10.1038/s41587-020-0711-0
pmc: PMC7954703
mid: NIHMS1630556
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
309-312Subventions
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Organisme : NHGRI NIH HHS
ID : RM1 HG008525
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : K99 HG010906
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008898
Pays : United States
Références
Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).
doi: 10.1038/nrg2950
Vinson, J. P. et al. Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res. 15, 1127–1135 (2005).
doi: 10.1101/gr.3722605
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
doi: 10.1038/nmeth.4035
Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).
doi: 10.1101/gr.214874.116
Garg, S. et al. A graph-based approach to diploid genome assembly. Bioinformatics 34, i105–i114 (2018).
doi: 10.1093/bioinformatics/bty279
Kronenberg, Z. N. et al. Extended haplotype phasing of de novo genome assemblies with FALCON-Phase. Preprint at bioRxiv https://doi.org/10.1101/327064 (2018).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Garg, S. et al. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 36, 2385–2392 (2019).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
doi: 10.1038/s41587-019-0217-9
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
doi: 10.1038/nbt.2727
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://doi.org/10.1101/705616 (2019).
Dudchenko, O. et al. De novo assembly of the genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
doi: 10.1126/science.aal3327
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
doi: 10.1101/gr.193474.115
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
doi: 10.1038/nbt.4235
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
doi: 10.1101/gr.213462.116
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
doi: 10.1038/nbt.2835
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
doi: 10.1038/s41587-019-0074-6
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
doi: 10.1038/s41467-018-08148-z
Porubsky, D. et al. A fully phased accurate assembly of an individual human genome. Preprint at bioRxiv https://doi.org/10.1101/855049 (2019).
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
doi: 10.1038/s41592-018-0054-7
Zook, J. M. et al. A robust benchmark for germline structural variant detection. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0538-8 (2020).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open v.4.0 (2015); http://www.repeatmasker.org
Nir, G. et al. Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling. PLoS Genet. 14, e1007872 (2018).
doi: 10.1371/journal.pgen.1007872