RACS: rapid analysis of ChIP-Seq data for contig based genomes.

Chromatin Immunoprecipitation Sequencing Chromosome Mapping Genome Genomics / methods Humans Molecular Sequence Annotation Sequence Analysis, DNA

Bioinformatics pipeline Chromatin immunoprecipitation High-performance computing Next generation sequencing Tetrahymena thermophila

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
29 Oct 2019

Historique:

received: 22 03 2019

accepted: 13 09 2019

entrez: 31 10 2019

pubmed: 31 10 2019

medline: 28 12 2019

Statut: epublish

Résumé

Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation.

CONCLUSIONS CONCLUSIONS

The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.

Identifiants

DOI: 10.1186/s12859-019-3100-2 PMID: 31664892 PMC: PMC6819487

pubmed: 31664892

doi: 10.1186/s12859-019-3100-2

pii: 10.1186/s12859-019-3100-2

pmc: PMC6819487

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

533

Subventions

Organisme : NSERC Discovery Grant

ID : RGPIN-2015-06448

Références

Science. 2010 May 21;328(5981):1036-40

pubmed: 20378774

Bioinformatics. 2009 Jul 15;25(14):1754-60

pubmed: 19451168

Bioinformatics. 2018 Sep 1;34(17):2899-2908

pubmed: 29659695

Genome Res. 2012 Sep;22(9):1813-31

pubmed: 22955991

Nat Rev Genet. 2009 Oct;10(10):669-80

pubmed: 19736561

Nucleic Acids Res. 2017 Nov 16;45(20):11594-11606

pubmed: 29036602

Nat Genet. 2000 May;25(1):25-9

pubmed: 10802651

Exp Cell Res. 1982 Jul;140(1):227-36

pubmed: 7106201

Vet Parasitol. 2001 Sep 12;100(1-2):105-16

pubmed: 11522410

Nat Rev Genet. 2009 Jan;10(1):57-63

pubmed: 19015660

Nat Methods. 2008 Jan;5(1):16-8

pubmed: 18165802

Bioinformatics. 2015 Jun 15;31(12):2032-4

pubmed: 25697820

Hum Mol Genet. 2017 Oct 1;26(R2):R202-R207

pubmed: 28977449

Epigenetics Chromatin. 2018 Mar 9;11(1):10

pubmed: 29523178

Nature. 2009 Sep 10;461(7261):272-6

pubmed: 19684571

Curr Biol. 2019 Jul 22;29(14):2371-2379.e6

pubmed: 31280994

Trends Genet. 2008 Mar;24(3):133-41

pubmed: 18262675

Bioinformatics. 2018 Sep 1;34(17):i722-i731

pubmed: 30423085

Mol Biochem Parasitol. 2002 Jul;122(2):119-26

pubmed: 12106865

Nat Rev Genet. 2001 Apr;2(4):292-301

pubmed: 11283701

Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279

pubmed: 27646134

PLoS One. 2012;7(2):e30630

pubmed: 22347391

Genome Biol. 2008;9(9):R137

pubmed: 18798982

Nat Biotechnol. 2012 May;30(5):434-9

pubmed: 22522955

Trends Genet. 2003 Mar;19(3):132-4

pubmed: 12615006

BMC Bioinformatics. 2016 Oct 3;17(1):404

pubmed: 27716038

Genetics. 2014 Jul;197(3):839-49

pubmed: 24793090

Science. 2001 Feb 16;291(5507):1304-51

pubmed: 11181995

PLoS Comput Biol. 2013;9(11):e1003326

pubmed: 24244136

Nat Biotechnol. 2011 Jan;29(1):24-6

pubmed: 21221095

BMC Bioinformatics. 2016 Jul 05;17(1):270

pubmed: 27377783

Nucleic Acids Res. 1999 Aug 1;27(15):3079-89

pubmed: 10454603

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W187-91

pubmed: 24799436

Science. 2007 Jun 8;316(5830):1497-502

pubmed: 17540862

Nat Methods. 2017 Apr;14(4):407-410

pubmed: 28218898

Genome Biol Evol. 2019 Jul 1;11(7):1952-1957

pubmed: 31218350

EMBO J. 1999 Aug 2;18(15):4222-32

pubmed: 10428960

Bioinformatics. 2009 Aug 15;25(16):2078-9

pubmed: 19505943

Dev Cell. 2015 Dec 21;35(6):775-88

pubmed: 26688337

Nat Commun. 2017 Jul 19;8:16027

pubmed: 28722025

Genome Res. 2009 Dec;19(12):2317-23

pubmed: 19819907

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D500-3

pubmed: 16381920

Gigascience. 2018 May 1;7(5):

pubmed: 29648610

Bioinformatics. 2010 Mar 15;26(6):841-2

pubmed: 20110278

Mol Biol Evol. 2019 May 1;36(5):1037-1055

pubmed: 30796450

J Eukaryot Microbiol. 2008 Mar-Apr;55(2):91-9

pubmed: 18318861

PLoS Biol. 2006 Sep;4(9):e286

pubmed: 16933976

RACS: rapid analysis of ChIP-Seq data for contig based genomes.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Alejandro Saettone (A)

Marcelo Ponce (M)

Syed Nabeel-Shah (S)

Jeffrey Fillingham (J)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Classifications MeSH