GreenHill: a de novo chromosome-level scaffolding and phasing tool using Hi-C.
Genome assembly
Haplotype
Hi-C
Phasing
Scaffolding
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
11 07 2023
11 07 2023
Historique:
received:
27
04
2022
accepted:
04
07
2023
medline:
13
7
2023
pubmed:
12
7
2023
entrez:
11
7
2023
Statut:
epublish
Résumé
Chromosome-level haplotype-resolved genome assembly is an important resource in molecular biology. However, current de novo haplotype assemblers require parental data or reference genomes and often fail to provide chromosome-level results. We present GreenHill, a novel scaffolding and phasing tool that considers various assemblers' contigs as input to reconstruct chromosome-level haplotypes using Hi-C without parental or reference data. Its unique functions include new error correction based on Hi-C contacts and the simultaneous use of Hi-C and long reads. Benchmarks reveal that GreenHill outperforms other approaches in contiguity and phasing accuracy, and the majority of chromosome arms are entirely phased.
Identifiants
pubmed: 37434204
doi: 10.1186/s13059-023-03006-8
pii: 10.1186/s13059-023-03006-8
pmc: PMC10334647
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
162Informations de copyright
© 2023. The Author(s).
Références
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Science. 2009 Oct 9;326(5950):289-93
pubmed: 19815776
Hortic Res. 2021 Aug 5;8(1):188
pubmed: 34354050
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
GigaByte. 2021 Mar 08;2021:gigabyte15
pubmed: 36824332
Nat Genet. 2020 Oct;52(10):1018-1023
pubmed: 32989320
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Nat Commun. 2020 Mar 18;11(1):1432
pubmed: 32188846
Nat Biotechnol. 2022 Sep;40(9):1332-1335
pubmed: 35332338
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Bioinformatics. 2020 May 1;36(9):2896-2898
pubmed: 31971576
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Rev Genet. 2011 Mar;12(3):215-23
pubmed: 21301473
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Nat Genet. 2018 Nov;50(11):1565-1573
pubmed: 30297971
Nat Commun. 2021 Apr 28;12(1):1935
pubmed: 33911078
Genome Med. 2014 Sep 25;6(9):73
pubmed: 25473435
Nat Plants. 2019 Aug;5(8):833-845
pubmed: 31383970
Gigascience. 2020 Jan 1;9(1):
pubmed: 31919520
Genome Biol. 2021 Apr 12;22(1):101
pubmed: 33845884
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Nature. 2021 Apr;592(7856):737-746
pubmed: 33911273
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
BMC Bioinformatics. 2018 Nov 29;19(1):460
pubmed: 30497373
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Commun Biol. 2020 Nov 25;3(1):712
pubmed: 33239669
Nat Commun. 2020 Apr 29;11(1):2071
pubmed: 32350247
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Biotechnol. 2021 Mar;39(3):309-312
pubmed: 33288905
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273
pubmed: 31433799
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Science. 2021 Apr 2;372(6537):
pubmed: 33632895
Front Microbiol. 2021 Jul 29;12:704253
pubmed: 34394053
Gigascience. 2018 Feb 1;7(2):
pubmed: 29149264
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Commun. 2019 Apr 12;10(1):1702
pubmed: 30979905
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Genes Dev. 2019 Nov 1;33(21-22):1591-1612
pubmed: 31601616