Haplotype-resolved assembly of diploid genomes without parental data.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
09 2022
Historique:
received: 10 09 2021
accepted: 14 02 2022
pubmed: 26 3 2022
medline: 14 9 2022
entrez: 25 3 2022
Statut: ppublish

Résumé

Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.

Identifiants

pubmed: 35332338
doi: 10.1038/s41587-022-01261-x
pii: 10.1038/s41587-022-01261-x
pmc: PMC9464699
mid: NIHMS1825186
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

1332-1335

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : Howard Hughes Medical Institute
Pays : United States

Informations de copyright

© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
doi: 10.1038/s41576-020-0236-x
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
doi: 10.1038/s41586-021-03451-0
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
doi: 10.1038/nmeth.4035
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
doi: 10.1101/gr.263566.120
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
doi: 10.1038/s41592-020-01056-5
Luo, X., Kang, X. & Schönhuth, A. phasebook: haplotype-aware de novo assembly of diploid genomes from long reads. Genome Biol. 22, 299 (2021).
doi: 10.1186/s13059-021-02512-x
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
doi: 10.1038/nbt.4277
Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 39, 309–312 (2021).
doi: 10.1038/s41587-020-0711-0
Porubsky, D. et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 39, 302–308 (2021).
doi: 10.1038/s41587-020-0719-5
Kronenberg, Z. N. et al. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat. Commun. 12, 1–10 (2021).
doi: 10.1038/s41467-020-20536-y
Edge, P., Bafna, V. & Bansal, V. Hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
doi: 10.1101/gr.213462.116
Tourdot, R. W., Brunette, G. J., Pinto, R. A. & Zhang, C.-Z. Determination of complete chromosomal haplotypes by bulk dna sequencing. Genome Biol. 22, 139 (2021).
doi: 10.1186/s13059-021-02330-1
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/705616v1 (2019).
Makeyev, A. V. et al. GTF2IRD2 is located in the Williams–Beuren syndrome critical region 7q11. 23 and encodes a protein with two TFII-I-like helix–loop–helix repeats. Proc. Natl Acad. Sci. USA 101, 11052–11057 (2004).
doi: 10.1073/pnas.0404150101
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01158-1 (2022).
Darwin Tree of Life Project Consortium. Sequence locally, think globally: the Darwin Tree of Life Project. Proc. Natl Acad. Sci. USA 119, e2115642118 (2022).
doi: 10.1073/pnas.2115642118
Du, K. et al. The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat. Ecol. Evol. 4, 841–852 (2020).
doi: 10.1038/s41559-020-1166-x
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
doi: 10.1093/bioinformatics/btv351
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
doi: 10.1093/bioinformatics/btaa025

Auteurs

Haoyu Cheng (H)

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Erich D Jarvis (ED)

The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
Howard Hughes Medical Institute, Chevy Chase, MD, USA.

Olivier Fedrigo (O)

The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.

Klaus-Peter Koepfli (KP)

Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA, USA.
Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington D.C., USA.
ITMO University, Computer Technologies Laboratory, St. Petersburg, Russia.

Lara Urban (L)

Department of Anatomy, University of Otago, Dunedin, New Zealand.

Neil J Gemmell (NJ)

Department of Anatomy, University of Otago, Dunedin, New Zealand.

Heng Li (H)

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. hli@ds.dfci.harvard.edu.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. hli@ds.dfci.harvard.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH