RACS: rapid analysis of ChIP-Seq data for contig based genomes.

Bioinformatics pipeline Chromatin immunoprecipitation High-performance computing Next generation sequencing Tetrahymena thermophila

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
29 Oct 2019
Historique:
received: 22 03 2019
accepted: 13 09 2019
entrez: 31 10 2019
pubmed: 31 10 2019
medline: 28 12 2019
Statut: epublish

Résumé

Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.

Sections du résumé

BACKGROUND BACKGROUND
Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking.
RESULTS RESULTS
We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation.
CONCLUSIONS CONCLUSIONS
The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.

Identifiants

pubmed: 31664892
doi: 10.1186/s12859-019-3100-2
pii: 10.1186/s12859-019-3100-2
pmc: PMC6819487
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

533

Subventions

Organisme : NSERC Discovery Grant
ID : RGPIN-2015-06448

Références

Science. 2010 May 21;328(5981):1036-40
pubmed: 20378774
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2018 Sep 1;34(17):2899-2908
pubmed: 29659695
Genome Res. 2012 Sep;22(9):1813-31
pubmed: 22955991
Nat Rev Genet. 2009 Oct;10(10):669-80
pubmed: 19736561
Nucleic Acids Res. 2017 Nov 16;45(20):11594-11606
pubmed: 29036602
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Exp Cell Res. 1982 Jul;140(1):227-36
pubmed: 7106201
Vet Parasitol. 2001 Sep 12;100(1-2):105-16
pubmed: 11522410
Nat Rev Genet. 2009 Jan;10(1):57-63
pubmed: 19015660
Nat Methods. 2008 Jan;5(1):16-8
pubmed: 18165802
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820
Hum Mol Genet. 2017 Oct 1;26(R2):R202-R207
pubmed: 28977449
Epigenetics Chromatin. 2018 Mar 9;11(1):10
pubmed: 29523178
Nature. 2009 Sep 10;461(7261):272-6
pubmed: 19684571
Curr Biol. 2019 Jul 22;29(14):2371-2379.e6
pubmed: 31280994
Trends Genet. 2008 Mar;24(3):133-41
pubmed: 18262675
Bioinformatics. 2018 Sep 1;34(17):i722-i731
pubmed: 30423085
Mol Biochem Parasitol. 2002 Jul;122(2):119-26
pubmed: 12106865
Nat Rev Genet. 2001 Apr;2(4):292-301
pubmed: 11283701
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
PLoS One. 2012;7(2):e30630
pubmed: 22347391
Genome Biol. 2008;9(9):R137
pubmed: 18798982
Nat Biotechnol. 2012 May;30(5):434-9
pubmed: 22522955
Trends Genet. 2003 Mar;19(3):132-4
pubmed: 12615006
BMC Bioinformatics. 2016 Oct 3;17(1):404
pubmed: 27716038
Genetics. 2014 Jul;197(3):839-49
pubmed: 24793090
Science. 2001 Feb 16;291(5507):1304-51
pubmed: 11181995
PLoS Comput Biol. 2013;9(11):e1003326
pubmed: 24244136
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
BMC Bioinformatics. 2016 Jul 05;17(1):270
pubmed: 27377783
Nucleic Acids Res. 1999 Aug 1;27(15):3079-89
pubmed: 10454603
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W187-91
pubmed: 24799436
Science. 2007 Jun 8;316(5830):1497-502
pubmed: 17540862
Nat Methods. 2017 Apr;14(4):407-410
pubmed: 28218898
Genome Biol Evol. 2019 Jul 1;11(7):1952-1957
pubmed: 31218350
EMBO J. 1999 Aug 2;18(15):4222-32
pubmed: 10428960
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Dev Cell. 2015 Dec 21;35(6):775-88
pubmed: 26688337
Nat Commun. 2017 Jul 19;8:16027
pubmed: 28722025
Genome Res. 2009 Dec;19(12):2317-23
pubmed: 19819907
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D500-3
pubmed: 16381920
Gigascience. 2018 May 1;7(5):
pubmed: 29648610
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Mol Biol Evol. 2019 May 1;36(5):1037-1055
pubmed: 30796450
J Eukaryot Microbiol. 2008 Mar-Apr;55(2):91-9
pubmed: 18318861
PLoS Biol. 2006 Sep;4(9):e286
pubmed: 16933976

Auteurs

Alejandro Saettone (A)

Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3, Canada.

Marcelo Ponce (M)

SciNet High Performance Computing Consortium, University of Toronto, 661 University Ave, Toronto, M5G 1M1, Canada.

Syed Nabeel-Shah (S)

Department of Molecular Genetics, University of Toronto, 1 King's College Cir, Toronto, M5S 1A8, Canada.

Jeffrey Fillingham (J)

Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3, Canada. jeffrey.fillingham@ryerson.ca.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH