De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.

Animals Avian Proteins / genetics Galliformes / classification Genome Molecular Sequence Annotation Nanopore Sequencing Phylogeny Whole Genome Sequencing

Indian national bird Oxford Nanopore Pavo cristatus genome assembly peacock

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 05 2019

Historique:

received: 29 07 2018

revised: 30 09 2018

accepted: 18 03 2019

entrez: 12 5 2019

pubmed: 12 5 2019

medline: 24 12 2019

Statut: ppublish

Résumé

The Indian peafowl (Pavo cristanus) is native to South Asia and is the national bird of India. Here we present a draft genome sequence of the male blue peacock using Illumina and Oxford Nanopore technology (ONT). ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150-base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences. We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read-based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.

Sections du résumé

BACKGROUND

RESULTS

ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150-base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences.

CONCLUSIONS

We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read-based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.

Identifiants

DOI: 10.1093/gigascience/giz038 PMID: 31077316 PMC: PMC6511069

pubmed: 31077316

pii: 5488106

doi: 10.1093/gigascience/giz038

pmc: PMC6511069

pii:

doi:

Substances chimiques

Avian Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Bioinformatics. 2008 Mar 1;24(5):637-44

pubmed: 18218656

Nucleic Acids Res. 2014 Jan;42(Database issue):D26-31

pubmed: 24225321

Annu Rev Anim Biosci. 2019 Feb 15;7:17-40

pubmed: 30485757

F1000Res. 2015 Aug 18;4:574

pubmed: 26913188

Mol Ecol Resour. 2018 Nov;18(6):1188-1195

pubmed: 30035372

Bioinformatics. 2009 Nov 1;25(21):2872-7

pubmed: 19528083

Genome Res. 2017 May;27(5):697-708

pubmed: 28360231

Genome Res. 2014 Aug;24(8):1384-95

pubmed: 24755901

J Mol Biol. 1990 Oct 5;215(3):403-10

pubmed: 2231712

Gigascience. 2014 Dec 11;3(1):26

pubmed: 25671091

Bioinformatics. 2012 Dec 1;28(23):3150-2

pubmed: 23060610

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432

pubmed: 30357350

Syst Biol. 2007 Aug;56(4):564-77

pubmed: 17654362

PLoS Biol. 2010 Sep 07;8(9):

pubmed: 20838655

Bioinformatics. 2009 Jul 15;25(14):1754-60

pubmed: 19451168

Gigascience. 2019 May 1;8(5):

pubmed: 31077316

Ann N Y Acad Sci. 2017 Feb;1389(1):164-185

pubmed: 27997700

Bioinformatics. 2007 Nov 1;23(21):2947-8

pubmed: 17846036

Brief Bioinform. 2018 Jan 1;19(1):23-40

pubmed: 27742661

Bioinformatics. 2011 Feb 15;27(4):578-9

pubmed: 21149342

Genome Biol. 2005;6(2):207

pubmed: 15693954

Gigascience. 2012 Dec 27;1(1):18

pubmed: 23587118

Gigascience. 2015 Feb 12;4:4

pubmed: 25741440

Gigascience. 2016 Aug 22;5(1):38

pubmed: 27549770

Science. 2014 Dec 12;346(6215):1308-9

pubmed: 25504710

Science. 2004 May 28;304(5675):1321-5

pubmed: 15131266

G3 (Bethesda). 2017 Jan 5;7(1):109-117

pubmed: 27852011

Nature. 2004 Dec 9;432(7018):695-716

pubmed: 15592404

Science. 2014 Dec 12;346(6215):1311-20

pubmed: 25504712

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W182-5

pubmed: 17526522

Nat Rev Genet. 2016 May 17;17(6):333-51

pubmed: 27184599

Genome Res. 2009 Sep;19(9):1639-45

pubmed: 19541911

Genome Biol Evol. 2017 Jan 1;9(1):161-177

pubmed: 28158585

Mol Biol Evol. 2017 Jul 1;34(7):1812-1819

pubmed: 28387841

Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279

pubmed: 27646134

Poult Sci. 2007 Jul;86(7):1460-71

pubmed: 17575197

Mol Biol Evol. 2015 Jan;32(1):268-74

pubmed: 25371430

Bioinformatics. 2014 Dec 1;30(23):3399-401

pubmed: 25143291

Genome Biol. 2016 Mar 23;17:53

pubmed: 27009100

Annu Rev Genomics Hum Genet. 2016 Aug 31;17:95-115

pubmed: 27362342

Genome Biol. 2015 May 21;16:106

pubmed: 25994148

Bioinformatics. 2011 Mar 15;27(6):764-70

pubmed: 21217122

Nucleic Acids Res. 2016 Jul 8;44(W1):W83-9

pubmed: 27098042

BMC Bioinformatics. 2014 Jun 20;15:211

pubmed: 24950923

De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Ruby Dhar (R)

Ashikh Seethy (A)

Karthikeyan Pethusamy (K)

Sunil Singh (S)

Vishwajeet Rohil (V)

Kakali Purkayastha (K)

Indrani Mukherjee (I)

Sandeep Goswami (S)

Rakesh Singh (R)

Ankita Raj (A)

Tryambak Srivastava (T)

Sovon Acharya (S)

Balaji Rajashekhar (B)

Subhradip Karmakar (S)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Evaluating the efficacy of telesurgery with dual console SSI Mantra Surgical Robotic System: experiment on animal model and clinical trials.

Odour generalisation and detection dog training.

FBXO22 inhibits colitis and colorectal carcinogenesis by regulating the degradation of the S2448-phosphorylated form of mTOR.

Classifications MeSH