A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
08 06 2023
Historique:
received: 11 11 2022
accepted: 16 05 2023
medline: 12 6 2023
pubmed: 9 6 2023
entrez: 8 6 2023
Statut: epublish

Résumé

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the "Charolais" breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.

Identifiants

pubmed: 37291142
doi: 10.1038/s41597-023-02249-1
pii: 10.1038/s41597-023-02249-1
pmc: PMC10250393
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

369

Informations de copyright

© 2023. The Author(s).

Références

Genome Biol. 2021 Nov 14;22(1):312
pubmed: 34775997
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Cell Syst. 2016 Jul;3(1):99-101
pubmed: 27467250
Nat Biotechnol. 2016 Mar;34(3):303-11
pubmed: 26829319
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521
pubmed: 33346817
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
BMC Genomics. 2009 Apr 24;10:180
pubmed: 19393050
Nat Methods. 2016 Jul;13(7):587-90
pubmed: 27159086
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654
pubmed: 34320186
Bioinformatics. 2019 Sep 1;35(17):2907-2915
pubmed: 30668829
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
PLoS One. 2016 Oct 5;11(10):e0163962
pubmed: 27706213
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Nucleic Acids Res. 2011 Jan;39(Database issue):D28-31
pubmed: 20972220

Auteurs

Camille Eché (C)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.

Carole Iampietro (C)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.

Clément Birbes (C)

Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.

Andreea Dréau (A)

Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.

Claire Kuchly (C)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.

Arnaud Di Franco (A)

Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.

Christophe Klopp (C)

Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.

Thomas Faraut (T)

GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, 31326, France.

Sarah Djebali (S)

GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, 31326, France.
IRSD, Université de Toulouse, INSERM, INRAE, ENVT, UPS, 31024, Toulouse, France.

Adrien Castinel (A)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.

Matthias Zytnicki (M)

Université Fédérale de Toulouse, INRAE, MIAT, 31326, Castanet-Tolosan, France.

Erwan Denis (E)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.

Mekki Boussaha (M)

Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France.

Cécile Grohs (C)

Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France.

Didier Boichard (D)

Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France.

Christine Gaspin (C)

Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.
Université Fédérale de Toulouse, INRAE, MIAT, 31326, Castanet-Tolosan, France.

Denis Milan (D)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.
GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, 31326, France.

Cécile Donnadieu (C)

INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France. cecile.donnadieu@inrae.fr.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH