Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
28 04 2021
28 04 2021
Historique:
received:
13
05
2020
accepted:
12
11
2020
entrez:
29
4
2021
pubmed:
30
4
2021
medline:
13
5
2021
Statut:
epublish
Résumé
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
Identifiants
pubmed: 33911078
doi: 10.1038/s41467-020-20536-y
pii: 10.1038/s41467-020-20536-y
pmc: PMC8081726
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1935Subventions
Organisme : Howard Hughes Medical Institute
Pays : United States
Références
Science. 2009 Oct 9;326(5950):289-93
pubmed: 19815776
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Nat Methods. 2012 Nov;9(11):1107-12
pubmed: 23042453
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Bioinformatics. 2014 Sep 1;30(17):2503-5
pubmed: 24812344
Nat Biotechnol. 2013 Dec;31(12):1111-8
pubmed: 24185094
Nat Biotechnol. 2013 Dec;31(12):1119-25
pubmed: 24185095
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Genet. 2017 Apr;49(4):643-650
pubmed: 28263316
Genome Res. 2017 May;27(5):757-767
pubmed: 28381613
Genet Med. 2018 Jan;20(1):159-163
pubmed: 28640241
Bioinformatics. 2014 Oct 15;30(20):2843-51
pubmed: 24974202
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Cell. 2014 Dec 18;159(7):1665-80
pubmed: 25497547
BMC Genomics. 2017 Jul 12;18(1):527
pubmed: 28701198
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Nat Commun. 2017 Nov 3;8(1):1293
pubmed: 29101320
Science. 2018 Jun 8;360(6393):
pubmed: 29880660
Nature. 2021 Apr;592(7856):737-746
pubmed: 33911273
Genome Res. 2008 Aug;18(8):1336-46
pubmed: 18676820
BMC Genomics. 2015 Apr 11;16:286
pubmed: 25886820
BMC Bioinformatics. 2018 Nov 29;19(1):460
pubmed: 30497373
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Genome Biol. 2015 Jan 24;16:13
pubmed: 25651527
Cell Syst. 2016 Jul;3(1):99-101
pubmed: 27467250
PLoS Biol. 2011 Jul;9(7):e1001091
pubmed: 21750661
Gigascience. 2017 Oct 1;6(10):1-16
pubmed: 29020750
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
Nat Commun. 2020 Apr 29;11(1):2071
pubmed: 32350247
Nature. 2015 Jan 29;517(7536):608-11
pubmed: 25383537
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Biotechnol. 2021 Mar;39(3):309-312
pubmed: 33288905
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Bioinformatics. 2020 Feb 15;36(4):1260-1261
pubmed: 31504176
Bioinformatics. 2020 May 1;36(9):2896-2898
pubmed: 31971576
Nat Rev Genet. 2018 Jun;19(6):329-346
pubmed: 29599501
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
PLoS One. 2012;7(11):e47768
pubmed: 23185243
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838