Haplotype-resolved assembly of diploid genomes without parental data.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
09 2022
09 2022
Historique:
received:
10
09
2021
accepted:
14
02
2022
pubmed:
26
3
2022
medline:
14
9
2022
entrez:
25
3
2022
Statut:
ppublish
Résumé
Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.
Identifiants
pubmed: 35332338
doi: 10.1038/s41587-022-01261-x
pii: 10.1038/s41587-022-01261-x
pmc: PMC9464699
mid: NIHMS1825186
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
1332-1335Subventions
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : Howard Hughes Medical Institute
Pays : United States
Informations de copyright
© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
doi: 10.1038/s41576-020-0236-x
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
doi: 10.1038/s41586-021-03451-0
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
doi: 10.1038/nmeth.4035
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
doi: 10.1101/gr.263566.120
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
doi: 10.1038/s41592-020-01056-5
Luo, X., Kang, X. & Schönhuth, A. phasebook: haplotype-aware de novo assembly of diploid genomes from long reads. Genome Biol. 22, 299 (2021).
doi: 10.1186/s13059-021-02512-x
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
doi: 10.1038/nbt.4277
Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 39, 309–312 (2021).
doi: 10.1038/s41587-020-0711-0
Porubsky, D. et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 39, 302–308 (2021).
doi: 10.1038/s41587-020-0719-5
Kronenberg, Z. N. et al. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat. Commun. 12, 1–10 (2021).
doi: 10.1038/s41467-020-20536-y
Edge, P., Bafna, V. & Bansal, V. Hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
doi: 10.1101/gr.213462.116
Tourdot, R. W., Brunette, G. J., Pinto, R. A. & Zhang, C.-Z. Determination of complete chromosomal haplotypes by bulk dna sequencing. Genome Biol. 22, 139 (2021).
doi: 10.1186/s13059-021-02330-1
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/705616v1 (2019).
Makeyev, A. V. et al. GTF2IRD2 is located in the Williams–Beuren syndrome critical region 7q11. 23 and encodes a protein with two TFII-I-like helix–loop–helix repeats. Proc. Natl Acad. Sci. USA 101, 11052–11057 (2004).
doi: 10.1073/pnas.0404150101
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01158-1 (2022).
Darwin Tree of Life Project Consortium. Sequence locally, think globally: the Darwin Tree of Life Project. Proc. Natl Acad. Sci. USA 119, e2115642118 (2022).
doi: 10.1073/pnas.2115642118
Du, K. et al. The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat. Ecol. Evol. 4, 841–852 (2020).
doi: 10.1038/s41559-020-1166-x
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
doi: 10.1093/bioinformatics/btv351
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
doi: 10.1093/bioinformatics/btaa025