A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
08 06 2023
08 06 2023
Historique:
received:
11
11
2022
accepted:
16
05
2023
medline:
12
6
2023
pubmed:
9
6
2023
entrez:
8
6
2023
Statut:
epublish
Résumé
Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the "Charolais" breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.
Identifiants
pubmed: 37291142
doi: 10.1038/s41597-023-02249-1
pii: 10.1038/s41597-023-02249-1
pmc: PMC10250393
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
369Informations de copyright
© 2023. The Author(s).
Références
Genome Biol. 2021 Nov 14;22(1):312
pubmed: 34775997
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Cell Syst. 2016 Jul;3(1):99-101
pubmed: 27467250
Nat Biotechnol. 2016 Mar;34(3):303-11
pubmed: 26829319
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521
pubmed: 33346817
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
BMC Genomics. 2009 Apr 24;10:180
pubmed: 19393050
Nat Methods. 2016 Jul;13(7):587-90
pubmed: 27159086
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654
pubmed: 34320186
Bioinformatics. 2019 Sep 1;35(17):2907-2915
pubmed: 30668829
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
PLoS One. 2016 Oct 5;11(10):e0163962
pubmed: 27706213
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Nucleic Acids Res. 2011 Jan;39(Database issue):D28-31
pubmed: 20972220