KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods.
Hi-C
KOREF_S1
Korean reference
ONT PromethION
PacBio HiFi
hybrid assembly
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
24 03 2022
24 03 2022
Historique:
received:
26
07
2021
revised:
10
12
2021
accepted:
13
02
2022
entrez:
25
3
2022
pubmed:
26
3
2022
medline:
5
4
2022
Statut:
ppublish
Résumé
KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far. We produced 705 Gb ONT reads and 114 Gb Pacific Biosciences HiFi reads, and corrected ONT reads by Pacific Biosciences reads. The corrected ultra-long reads reached higher accuracy of 1.4% base errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method, acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38's chromosome 2. In addition, the final assembly showed high base accuracy, with <0.01% base errors. KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly that combines ONT's PromethION and PacBio's HiFi-CCS.
Sections du résumé
BACKGROUND
KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far.
RESULTS
We produced 705 Gb ONT reads and 114 Gb Pacific Biosciences HiFi reads, and corrected ONT reads by Pacific Biosciences reads. The corrected ultra-long reads reached higher accuracy of 1.4% base errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method, acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38's chromosome 2. In addition, the final assembly showed high base accuracy, with <0.01% base errors.
CONCLUSIONS
KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly that combines ONT's PromethION and PacBio's HiFi-CCS.
Identifiants
pubmed: 35333300
pii: 6554097
doi: 10.1093/gigascience/giac022
pmc: PMC8952264
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Commentaires et corrections
Type : ErratumIn
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press GigaScience.
Références
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654
pubmed: 34320186
PLoS Biol. 2007 Sep 4;5(10):e254
pubmed: 17803354
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Nature. 2016 Oct 13;538(7624):243-247
pubmed: 27706134
Gigascience. 2019 Dec 1;8(12):
pubmed: 31794015
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Science. 2010 May 7;328(5979):710-722
pubmed: 20448178
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044
Nat Commun. 2021 Jan 11;12(1):226
pubmed: 33431880
Nat Rev Genet. 2020 Oct;21(10):597-614
pubmed: 32504078
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
Genetics. 2022 Feb 4;220(2):
pubmed: 34897437
Gigascience. 2022 Mar 24;11:
pubmed: 35333300
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2020 Dec 15;:
pubmed: 33320174
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Genome Biol. 2020 Jun 2;21(1):129
pubmed: 32487205
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Genome Biol. 2016 Apr 12;17:66
pubmed: 27072794
Nat Commun. 2016 Nov 24;7:13637
pubmed: 27882922
Gigascience. 2015 Aug 04;4:35
pubmed: 26244089
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509