Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data.
Chromosomes
Genome assembly
Hi-C
HiFi
Long-read sequencing
Nanopore
Phase switches
Phasing
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
25 03 2022
25 03 2022
Historique:
received:
29
04
2021
accepted:
21
03
2022
entrez:
26
3
2022
pubmed:
27
3
2022
medline:
3
5
2022
Statut:
epublish
Résumé
Most animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false-positive signals. We generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false-positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30-40 times using HiFi sequencing is required for phasing of the leaf rust genome, with 0.7% heterozygosity, and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly. This first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms.
Sections du résumé
BACKGROUND
Most animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false-positive signals.
RESULTS
We generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false-positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30-40 times using HiFi sequencing is required for phasing of the leaf rust genome, with 0.7% heterozygosity, and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly.
CONCLUSIONS
This first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms.
Identifiants
pubmed: 35337367
doi: 10.1186/s13059-022-02658-2
pii: 10.1186/s13059-022-02658-2
pmc: PMC8957140
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
84Informations de copyright
© 2022. The Author(s).
Références
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Nat Plants. 2019 Aug;5(8):833-845
pubmed: 31383970
Genome Biol Evol. 2016 Sep 11;8(9):2702-21
pubmed: 27521814
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Nat Commun. 2019 Nov 7;10(1):5068
pubmed: 31699975
Nat Commun. 2021 Jan 4;12(1):60
pubmed: 33397900
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Gigascience. 2020 Dec 15;9(12):
pubmed: 33319909
Nat Rev Genet. 2020 Oct;21(10):597-614
pubmed: 32504078
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Nat Commun. 2021 Apr 28;12(1):1935
pubmed: 33911078
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Nat Commun. 2018 Jan 15;9(1):189
pubmed: 29335486
Curr Opin Plant Biol. 2005 Aug;8(4):441-9
pubmed: 15922652
Mol Plant Pathol. 2008 Sep;9(5):563-75
pubmed: 19018988
Comput Struct Biotechnol J. 2019 Dec 09;18:66-72
pubmed: 31908732
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
BMC Genomics. 2017 Jul 12;18(1):527
pubmed: 28701198
Front Plant Sci. 2014 Aug 26;5:422
pubmed: 25206357
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Genome Biol. 2020 Feb 7;21(1):30
pubmed: 32033565
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Curr Opin Plant Biol. 2020 Aug;56:20-27
pubmed: 32244171
BMC Biol. 2021 Sep 15;19(1):203
pubmed: 34526021
Bioinformatics. 2018 Sep 1;34(17):i884-i890
pubmed: 30423086
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Nature. 2021 Apr;592(7856):737-746
pubmed: 33911273
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457
pubmed: 32300014
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
New Phytol. 2019 May;222(3):1190-1206
pubmed: 30554421
Mol Plant Pathol. 2015 Dec;16(9):1006-10
pubmed: 25784533
PLoS Pathog. 2014 Sep 11;10(9):e1004329
pubmed: 25211126
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
Front Genet. 2020 Jun 04;11:521
pubmed: 32582280
Mol Plant Pathol. 2018 Jun;19(6):1523-1536
pubmed: 29045052
BMC Genomics. 2011 Mar 24;12:161
pubmed: 21435244
Curr Opin Plant Biol. 2020 Apr;54:26-33
pubmed: 31981929
Bioinformatics. 2018 Aug 1;34(15):2666-2669
pubmed: 29547981
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
G3 (Bethesda). 2017 Feb 9;7(2):361-376
pubmed: 27913634
Nat Biotechnol. 2021 Mar;39(3):309-312
pubmed: 33288905
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Genome Biol. 2015 Dec 01;16:259
pubmed: 26619908
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712