RACS: rapid analysis of ChIP-Seq data for contig based genomes.
Bioinformatics pipeline
Chromatin immunoprecipitation
High-performance computing
Next generation sequencing
Tetrahymena thermophila
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
29 Oct 2019
29 Oct 2019
Historique:
received:
22
03
2019
accepted:
13
09
2019
entrez:
31
10
2019
pubmed:
31
10
2019
medline:
28
12
2019
Statut:
epublish
Résumé
Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.
Sections du résumé
BACKGROUND
BACKGROUND
Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking.
RESULTS
RESULTS
We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation.
CONCLUSIONS
CONCLUSIONS
The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.
Identifiants
pubmed: 31664892
doi: 10.1186/s12859-019-3100-2
pii: 10.1186/s12859-019-3100-2
pmc: PMC6819487
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
533Subventions
Organisme : NSERC Discovery Grant
ID : RGPIN-2015-06448
Références
Science. 2010 May 21;328(5981):1036-40
pubmed: 20378774
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2018 Sep 1;34(17):2899-2908
pubmed: 29659695
Genome Res. 2012 Sep;22(9):1813-31
pubmed: 22955991
Nat Rev Genet. 2009 Oct;10(10):669-80
pubmed: 19736561
Nucleic Acids Res. 2017 Nov 16;45(20):11594-11606
pubmed: 29036602
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Exp Cell Res. 1982 Jul;140(1):227-36
pubmed: 7106201
Vet Parasitol. 2001 Sep 12;100(1-2):105-16
pubmed: 11522410
Nat Rev Genet. 2009 Jan;10(1):57-63
pubmed: 19015660
Nat Methods. 2008 Jan;5(1):16-8
pubmed: 18165802
Bioinformatics. 2015 Jun 15;31(12):2032-4
pubmed: 25697820
Hum Mol Genet. 2017 Oct 1;26(R2):R202-R207
pubmed: 28977449
Epigenetics Chromatin. 2018 Mar 9;11(1):10
pubmed: 29523178
Nature. 2009 Sep 10;461(7261):272-6
pubmed: 19684571
Curr Biol. 2019 Jul 22;29(14):2371-2379.e6
pubmed: 31280994
Trends Genet. 2008 Mar;24(3):133-41
pubmed: 18262675
Bioinformatics. 2018 Sep 1;34(17):i722-i731
pubmed: 30423085
Mol Biochem Parasitol. 2002 Jul;122(2):119-26
pubmed: 12106865
Nat Rev Genet. 2001 Apr;2(4):292-301
pubmed: 11283701
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
PLoS One. 2012;7(2):e30630
pubmed: 22347391
Genome Biol. 2008;9(9):R137
pubmed: 18798982
Nat Biotechnol. 2012 May;30(5):434-9
pubmed: 22522955
Trends Genet. 2003 Mar;19(3):132-4
pubmed: 12615006
BMC Bioinformatics. 2016 Oct 3;17(1):404
pubmed: 27716038
Genetics. 2014 Jul;197(3):839-49
pubmed: 24793090
Science. 2001 Feb 16;291(5507):1304-51
pubmed: 11181995
PLoS Comput Biol. 2013;9(11):e1003326
pubmed: 24244136
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
BMC Bioinformatics. 2016 Jul 05;17(1):270
pubmed: 27377783
Nucleic Acids Res. 1999 Aug 1;27(15):3079-89
pubmed: 10454603
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W187-91
pubmed: 24799436
Science. 2007 Jun 8;316(5830):1497-502
pubmed: 17540862
Nat Methods. 2017 Apr;14(4):407-410
pubmed: 28218898
Genome Biol Evol. 2019 Jul 1;11(7):1952-1957
pubmed: 31218350
EMBO J. 1999 Aug 2;18(15):4222-32
pubmed: 10428960
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Dev Cell. 2015 Dec 21;35(6):775-88
pubmed: 26688337
Nat Commun. 2017 Jul 19;8:16027
pubmed: 28722025
Genome Res. 2009 Dec;19(12):2317-23
pubmed: 19819907
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D500-3
pubmed: 16381920
Gigascience. 2018 May 1;7(5):
pubmed: 29648610
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Mol Biol Evol. 2019 May 1;36(5):1037-1055
pubmed: 30796450
J Eukaryot Microbiol. 2008 Mar-Apr;55(2):91-9
pubmed: 18318861
PLoS Biol. 2006 Sep;4(9):e286
pubmed: 16933976