Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling.
ATAC-seq
Bias correction
DNase-seq
Footprinting
Reproducibility
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
21 02 2019
21 02 2019
Historique:
received:
18
04
2018
accepted:
13
02
2019
entrez:
23
2
2019
pubmed:
23
2
2019
medline:
22
3
2019
Statut:
epublish
Résumé
DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq.
Sections du résumé
BACKGROUND
DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq.
RESULTS
Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints.
CONCLUSIONS
We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq.
Identifiants
pubmed: 30791920
doi: 10.1186/s13059-019-1654-y
pii: 10.1186/s13059-019-1654-y
pmc: PMC6385462
doi:
Substances chimiques
Transcription Factors
0
Types de publication
Comparative Study
Evaluation Study
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
42Références
Nat Methods. 2016 Mar;13(3):222-228
pubmed: 26914206
Nat Methods. 2016 Mar;13(3):213-21
pubmed: 26914205
Nat Methods. 2013 Apr;10(4):325-7
pubmed: 23435259
Nucleic Acids Res. 2014 Oct 29;42(19):11865-78
pubmed: 25294828
Front Bioeng Biotechnol. 2015 Sep 22;3:144
pubmed: 26442258
Pac Symp Biocomput. 2013;:80-91
pubmed: 23424114
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
Nat Methods. 2013 Dec;10(12):1213-8
pubmed: 24097267
Nucleic Acids Res. 2017 May 5;45(8):4315-4329
pubmed: 28334916
PLoS One. 2015 Sep 25;10(9):e0138030
pubmed: 26406244
Cell. 2008 Jan 25;132(2):311-22
pubmed: 18243105
Mol Cell. 2014 Oct 23;56(2):275-285
pubmed: 25242143
Nature. 2012 Sep 6;489(7414):83-90
pubmed: 22955618
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Nat Methods. 2009 Apr;6(4):283-9
pubmed: 19305407
Bioinformatics. 2012 Jan 1;28(1):56-62
pubmed: 22072382
Annu Rev Biochem. 1988;57:159-97
pubmed: 3052270
Proc Natl Acad Sci U S A. 2013 Apr 16;110(16):6376-81
pubmed: 23576721
Nat Methods. 2014 Jan;11(1):73-78
pubmed: 24317252
Genome Res. 2011 Mar;21(3):456-64
pubmed: 21106903
Nat Methods. 2016 Apr;13(4):303-9
pubmed: 26901649
Genome Biol. 2019 Feb 21;20(1):42
pubmed: 30791920
Genome Biol. 2010;11(12):R119
pubmed: 21143862
Nat Biotechnol. 2014 Feb;32(2):171-178
pubmed: 24441470
Nucleic Acids Res. 2017 Apr 7;45(6):e41
pubmed: 27903897
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Cold Spring Harb Protoc. 2010 Feb;2010(2):pdb.prot5384
pubmed: 20150147
Nat Rev Genet. 2012 Sep;13(9):613-26
pubmed: 22868264
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Nucleic Acids Res. 2018 Jan 25;46(2):e9
pubmed: 29126307
Bioinformatics. 2017 Apr 1;33(7):956-963
pubmed: 27993786
Nucleic Acids Res. 1978 Sep;5(9):3157-70
pubmed: 212715
Bioinformatics. 2014 Nov 15;30(22):3143-51
pubmed: 25086003
Genome Res. 2011 Mar;21(3):447-55
pubmed: 21106904
Bioinformatics. 2015 Jan 1;31(1):48-55
pubmed: 25223640
Bioinformatics. 2015 Sep 1;31(17):2852-9
pubmed: 25957350
Genome Res. 2009 Apr;19(4):644-56
pubmed: 19141595
Sci Rep. 2017 May 26;7(1):2451
pubmed: 28550296
Nucleic Acids Res. 2013 Nov;41(21):e201
pubmed: 24071585
PLoS One. 2013 Jul 26;8(7):e69853
pubmed: 23922824
Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32
pubmed: 26527727