De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.
Indian national bird
Oxford Nanopore
Pavo cristatus
genome assembly
peacock
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 05 2019
01 05 2019
Historique:
received:
29
07
2018
revised:
30
09
2018
accepted:
18
03
2019
entrez:
12
5
2019
pubmed:
12
5
2019
medline:
24
12
2019
Statut:
ppublish
Résumé
The Indian peafowl (Pavo cristanus) is native to South Asia and is the national bird of India. Here we present a draft genome sequence of the male blue peacock using Illumina and Oxford Nanopore technology (ONT). ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150-base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences. We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read-based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.
Sections du résumé
BACKGROUND
The Indian peafowl (Pavo cristanus) is native to South Asia and is the national bird of India. Here we present a draft genome sequence of the male blue peacock using Illumina and Oxford Nanopore technology (ONT).
RESULTS
ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150-base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences.
CONCLUSIONS
We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read-based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.
Identifiants
pubmed: 31077316
pii: 5488106
doi: 10.1093/gigascience/giz038
pmc: PMC6511069
pii:
doi:
Substances chimiques
Avian Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Bioinformatics. 2008 Mar 1;24(5):637-44
pubmed: 18218656
Nucleic Acids Res. 2014 Jan;42(Database issue):D26-31
pubmed: 24225321
Annu Rev Anim Biosci. 2019 Feb 15;7:17-40
pubmed: 30485757
F1000Res. 2015 Aug 18;4:574
pubmed: 26913188
Mol Ecol Resour. 2018 Nov;18(6):1188-1195
pubmed: 30035372
Bioinformatics. 2009 Nov 1;25(21):2872-7
pubmed: 19528083
Genome Res. 2017 May;27(5):697-708
pubmed: 28360231
Genome Res. 2014 Aug;24(8):1384-95
pubmed: 24755901
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Gigascience. 2014 Dec 11;3(1):26
pubmed: 25671091
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
Syst Biol. 2007 Aug;56(4):564-77
pubmed: 17654362
PLoS Biol. 2010 Sep 07;8(9):
pubmed: 20838655
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Gigascience. 2019 May 1;8(5):
pubmed: 31077316
Ann N Y Acad Sci. 2017 Feb;1389(1):164-185
pubmed: 27997700
Bioinformatics. 2007 Nov 1;23(21):2947-8
pubmed: 17846036
Brief Bioinform. 2018 Jan 1;19(1):23-40
pubmed: 27742661
Bioinformatics. 2011 Feb 15;27(4):578-9
pubmed: 21149342
Genome Biol. 2005;6(2):207
pubmed: 15693954
Gigascience. 2012 Dec 27;1(1):18
pubmed: 23587118
Gigascience. 2015 Feb 12;4:4
pubmed: 25741440
Gigascience. 2016 Aug 22;5(1):38
pubmed: 27549770
Science. 2014 Dec 12;346(6215):1308-9
pubmed: 25504710
Science. 2004 May 28;304(5675):1321-5
pubmed: 15131266
G3 (Bethesda). 2017 Jan 5;7(1):109-117
pubmed: 27852011
Nature. 2004 Dec 9;432(7018):695-716
pubmed: 15592404
Science. 2014 Dec 12;346(6215):1311-20
pubmed: 25504712
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W182-5
pubmed: 17526522
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Genome Res. 2009 Sep;19(9):1639-45
pubmed: 19541911
Genome Biol Evol. 2017 Jan 1;9(1):161-177
pubmed: 28158585
Mol Biol Evol. 2017 Jul 1;34(7):1812-1819
pubmed: 28387841
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
Poult Sci. 2007 Jul;86(7):1460-71
pubmed: 17575197
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Bioinformatics. 2014 Dec 1;30(23):3399-401
pubmed: 25143291
Genome Biol. 2016 Mar 23;17:53
pubmed: 27009100
Annu Rev Genomics Hum Genet. 2016 Aug 31;17:95-115
pubmed: 27362342
Genome Biol. 2015 May 21;16:106
pubmed: 25994148
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Nucleic Acids Res. 2016 Jul 8;44(W1):W83-9
pubmed: 27098042
BMC Bioinformatics. 2014 Jun 20;15:211
pubmed: 24950923