Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.
Oxford Nanopore
genome assemblies
genomes
killifish
long reads
polish
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 06 2020
01 06 2020
Historique:
received:
04
10
2019
revised:
16
04
2020
accepted:
27
05
2020
entrez:
20
6
2020
pubmed:
20
6
2020
medline:
5
10
2021
Statut:
ppublish
Résumé
Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30-45× sequence coverage, and the Illumina platform was used to generate 50-160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.
Sections du résumé
BACKGROUND
Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms.
FINDINGS
Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30-45× sequence coverage, and the Illumina platform was used to generate 50-160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database.
CONCLUSIONS
High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.
Identifiants
pubmed: 32556169
pii: 5859380
doi: 10.1093/gigascience/giaa067
pmc: PMC7301629
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Nat Commun. 2019 Jan 16;10(1):260
pubmed: 30651564
Nat Biotechnol. 2018 Dec 6;36(12):1121
pubmed: 30520871
Evolution. 2010 Jul;64(7):2070-85
pubmed: 20100216
Gigascience. 2018 Mar 1;7(3):1-6
pubmed: 29342277
Nat Commun. 2017 Feb 20;8:14515
pubmed: 28218240
Science. 2012 Apr 27;336(6080):455-8
pubmed: 22539717
Genome Biol Evol. 2017 Feb 13;9(3):659-676
pubmed: 28201664
Gigascience. 2019 Dec 1;8(12):
pubmed: 31794015
Bioinformatics. 2018 Aug 1;34(15):2666-2669
pubmed: 29547981
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Hereditas. 2003;138(3):161-5
pubmed: 14641478
Genome Res. 2002 May;12(5):669-71
pubmed: 11997333
Sci Rep. 2018 Jul 19;8(1):10931
pubmed: 30026559
Elife. 2016 Apr 07;5:
pubmed: 27054412
Brief Bioinform. 2019 Jul 19;20(4):1542-1559
pubmed: 29617724
Evol Appl. 2014 Nov;7(9):1026-42
pubmed: 25553065
Genome Res. 2019 Jul;29(7):1178-1187
pubmed: 31186302
Gigascience. 2018 Apr 1;7(4):
pubmed: 29617771
Gigascience. 2015 Nov 26;4:56
pubmed: 26617983
Biomol Detect Quantif. 2015 Mar;3:1-8
pubmed: 26753127
Appl Plant Sci. 2018 Mar 30;6(3):e1030
pubmed: 29732260
Gigascience. 2017 Aug 1;6(8):1-6
pubmed: 28873963
Science. 2019 May 3;364(6439):455-457
pubmed: 31048485
F1000Res. 2017 Jul 7;6:1083
pubmed: 29375809
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
F1000Res. 2018 Feb 5;7:
pubmed: 29568489
Genome Biol. 2019 May 20;20(1):97
pubmed: 31104630
Nat Protoc. 2017 Jun;12(6):1261-1276
pubmed: 28538739
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Sci Rep. 2017 Jun 21;7(1):3935
pubmed: 28638050
Genome Res. 2018 Feb;28(2):266-274
pubmed: 29273626
Science. 2019 Jan 4;363(6422):74-77
pubmed: 30606844
Sci Rep. 2018 Jul 19;8(1):10950
pubmed: 30026539
Mol Ecol Resour. 2019 Jan;19(1):77-89
pubmed: 30118581
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Microbiol Resour Announc. 2018 Oct 18;7(15):
pubmed: 30533723
Nature. 2016 Feb 11;530(7589):228-232
pubmed: 26840485
Genome Biol. 2013 Jul 30;14(7):128
pubmed: 23906089
Comp Biochem Physiol Part D Genomics Proteomics. 2007 Dec;2(4):257-86
pubmed: 18071578
F1000Res. 2015 Oct 15;4:1075
pubmed: 26834992
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
G3 (Bethesda). 2018 Oct 3;8(10):3131-3141
pubmed: 30087105
Gigascience. 2020 Jun 1;9(6):
pubmed: 32556169
Front Genet. 2014 Jan 31;5:13
pubmed: 24567737
Plant Cell. 2017 Oct;29(10):2336-2348
pubmed: 29025960
BMC Bioinformatics. 2018 Jan 30;19(1):26
pubmed: 29382321
Nat Biotechnol. 2019 Feb;37(2):124-126
pubmed: 30670796
Nat Genet. 2017 Apr;49(4):643-650
pubmed: 28263316
Science. 2016 Dec 09;354(6317):1305-1308
pubmed: 27940876
Nat Plants. 2018 Nov;4(11):879-887
pubmed: 30390080