Identifying transcription factors with cell-type specific DNA binding signatures.
Cell-type specificity
Deep learning
Differential binding
Transcription factor binding
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
14 Oct 2024
14 Oct 2024
Historique:
received:
10
07
2024
accepted:
02
10
2024
medline:
15
10
2024
pubmed:
15
10
2024
entrez:
14
10
2024
Statut:
epublish
Résumé
Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of other mechanisms, such as steric hindrance or cooperative binding, that may result in a "DNA signature" of differential binding. To survey this phenomenon systematically, we developed a deep learning method we call SigTFB (Signatures of TF Binding) to detect and quantify cell-type specificity in a TF's known genomic binding sites. We used ENCODE ChIP-seq data to conduct a wide scale investigation of 169 distinct TFs in up to 14 distinct cell types. SigTFB detected statistically significant DNA binding signatures in approximately two-thirds of TFs, far more than might have been expected from the relatively sparse evidence in prior literature. We found that the presence or absence of a cell-type specific DNA binding signature is distinct from, and indeed largely uncorrelated to, the degree of overlap between ChIP-seq peaks in different cell types, and tended to arise by two mechanisms: using established motifs in different frequencies, and by selective inclusion of motifs for distint TFs. While recent results have highlighted cell state features such as chromatin accessibility and gene expression in predicting TF binding, our results emphasize that, for some TFs, the DNA sequences of the binding sites contain substantial cell-type specific motifs.
Sections du résumé
BACKGROUND
BACKGROUND
Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of other mechanisms, such as steric hindrance or cooperative binding, that may result in a "DNA signature" of differential binding.
RESULTS
RESULTS
To survey this phenomenon systematically, we developed a deep learning method we call SigTFB (Signatures of TF Binding) to detect and quantify cell-type specificity in a TF's known genomic binding sites. We used ENCODE ChIP-seq data to conduct a wide scale investigation of 169 distinct TFs in up to 14 distinct cell types. SigTFB detected statistically significant DNA binding signatures in approximately two-thirds of TFs, far more than might have been expected from the relatively sparse evidence in prior literature. We found that the presence or absence of a cell-type specific DNA binding signature is distinct from, and indeed largely uncorrelated to, the degree of overlap between ChIP-seq peaks in different cell types, and tended to arise by two mechanisms: using established motifs in different frequencies, and by selective inclusion of motifs for distint TFs.
CONCLUSIONS
CONCLUSIONS
While recent results have highlighted cell state features such as chromatin accessibility and gene expression in predicting TF binding, our results emphasize that, for some TFs, the DNA sequences of the binding sites contain substantial cell-type specific motifs.
Identifiants
pubmed: 39402535
doi: 10.1186/s12864-024-10859-1
pii: 10.1186/s12864-024-10859-1
doi:
Substances chimiques
Transcription Factors
0
DNA
9007-49-2
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
957Subventions
Organisme : Queen Elizabeth Scholars
ID : QEII-GST
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : RGPIN-2019-06604
Organisme : Alliance de recherche numérique du Canada
ID : RRG-TPERKINS
Informations de copyright
© 2024. The Author(s).
Références
Bintu L, et al. Transcriptional regulation by the numbers: models. Curr Opin Genet Dev. 2005;15:116–24.
pubmed: 15797194
pmcid: 3482385
doi: 10.1016/j.gde.2005.02.007
Desvergne B, Michalik L, Wahli W. Transcriptional regulation of metabolism. Physiol Rev. 2006;86:465–514.
pubmed: 16601267
doi: 10.1152/physrev.00025.2005
Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152:1237–51.
pubmed: 23498934
pmcid: 3640494
doi: 10.1016/j.cell.2013.02.014
Matys V, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:D108–10.
pubmed: 16381825
doi: 10.1093/nar/gkj143
Bryne JC, et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2007;36:D102–6.
pubmed: 18006571
pmcid: 2238834
doi: 10.1093/nar/gkm955
Soleimani VD, et al. Cis-regulatory determinants of MyoD function. Nucleic Acids Res. 2018;46:7221–35.
pubmed: 30016497
pmcid: 6101602
doi: 10.1093/nar/gky388
Consortium EP, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57.
Lee B-K, et al. Cell-type specific and combinatorial usage of diverse transcriptionfactors revealed by genome-wide binding studies in multiple human cells. Genome Res. 2012;22:9–24.
pubmed: 22090374
pmcid: 3246210
doi: 10.1101/gr.127597.111
Benedetti M, Levi A, Chao MV. Di erential expression of nerve growth factor receptors leads to altered binding affinity and neurotrophin responsiveness. Proc Natl Acad Sci. 1993;90:7859–63.
pubmed: 8356095
pmcid: 47242
doi: 10.1073/pnas.90.16.7859
Srivastava D, Mahony S. Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. Biochim Biophys Acta (BBA) Gene Regul Mech. 2020;1863:194443.
Brand M, et al. Dynamic changes in transcription factor complexes during erythroid differentiation revealed by quantitative proteomics. Nat Struct Mol Biol. 2004;11:73–80.
pubmed: 14718926
doi: 10.1038/nsmb713
Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet. 2001;29(153):159.
Nie Y, Shu C, Sun X. Cooperative binding of transcription factors in the human genome. Genomics. 2020;112:3427–34.
pubmed: 32574834
doi: 10.1016/j.ygeno.2020.06.029
Lowen M, Scott G, Zwollo P. Functional analyses of two alternative isoforms of the transcription factor Pax-5. J Biol Chem. 2001;276:42565–74.
pubmed: 11535600
doi: 10.1074/jbc.M106536200
Castro-Mondragon JA, et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022;50:D165–73.
pubmed: 34850907
doi: 10.1093/nar/gkab1113
Kulakovskiy IV, et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 2018;46:D252–9.
pubmed: 29140464
doi: 10.1093/nar/gkx1106
Ogawa N, Biggin MD. High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro. Gene Regul Netw Methods Protoc. 2012;786:51–63.
Gertz J, et al. Distinct properties of cell-type-specific and shared transcription factor binding sites. Mol Cell. 2013;52:25–36.
pubmed: 24076218
doi: 10.1016/j.molcel.2013.08.037
Zhang S, et al. OCT4 and PAX6 determine the dual function of SOX2 in human ESCs as a key pluripotent or neural factor. Stem Cell Res Ther. 2019;10:1–14.
doi: 10.1186/s13287-019-1228-7
Wang J, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22:1798–812.
pubmed: 22955990
pmcid: 3431495
doi: 10.1101/gr.139105.112
Arvey A, Agius P, Noble WS, Leslie C. Sequence and chromatin determinants of cell type-specific transcription factor binding. Genome Res. 2012;22:1723–34.
pubmed: 22955984
pmcid: 3431489
doi: 10.1101/gr.127712.111
Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 2019;20:1–17.
doi: 10.1186/s13059-018-1614-y
McLeay RC, Bailey TL. Motif enrichment analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics. 2010;11:1–11.
doi: 10.1186/1471-2105-11-165
Lesluyes T, Johnson J, Machanick P, Bailey TL. Differential motif enrichment analysis of paired ChIP-seq experiments. BMC Genomics. 2014;15:1–13.
doi: 10.1186/1471-2164-15-752
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33:831–8.
pubmed: 26213851
doi: 10.1038/nbt.3300
Hassanzadeh H, Wang MD. DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins. Los Alamitos: IEEE Computer Society; 2016. p. 178–83.
Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016;44:e107–e107.
pubmed: 27084946
pmcid: 4914104
doi: 10.1093/nar/gkw226
Chen C, et al. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics. 2021;22:1–18.
Quang D, Xie X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods. 2019;166:40–7.
pubmed: 30922998
pmcid: 6708499
doi: 10.1016/j.ymeth.2019.03.020
Li H, Guan Y. Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution. Genome Res. 2021;31:721–31.
pubmed: 33741685
pmcid: 8015851
doi: 10.1101/gr.269613.120
Zhang Y, Wang Z, Zeng Y, Zhou J, Zou Q. High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method. Brief Bioinform. 2021;22:bbab273.
Zhang Q, et al. Base-resolution prediction of transcription factor binding signals by a deep learning framework. PLoS Comput Biol. 2022;18:e1009941.
pubmed: 35263332
pmcid: 8982852
doi: 10.1371/journal.pcbi.1009941
Cao L, Liu P, Chen J, Deng L. Prediction of transcription factor binding sites using a combined deep learning approach. Front Oncol. 2022;12:893520.
pubmed: 35719916
pmcid: 9204005
doi: 10.3389/fonc.2022.893520
Ng JW, Ong EH, Tucker-Kellogg L, Tucker-Kellogg G. Deep learning for de-convolution of Smad2 versus Smad3 binding sites. BMC Genomics. 2022;23:525.
pubmed: 35858839
pmcid: 9297549
doi: 10.1186/s12864-022-08565-x
Ding P, et al. DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape. Brief Bioinform. 2023;24:bbad231.
Zhang J, Liu B, Wu J, Wang Z, Li J. DeepCAC: a deep learning approach on DNA transcription factors classification based on multi-head self-attention and concatenate convolutional neural network. BMC Bioinformatics. 2023;24:345.
pubmed: 37723425
pmcid: 10506269
doi: 10.1186/s12859-023-05469-9
Wang K, et al. BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning. Brief Bioinform. 2024;25:bbae195.
Zhuang J, et al. MulTFBS: A spatial-temporal network with multichannels for predicting transcription factor binding sites. J Chem Inf Model. 2024;64(10):1549–9596.
Andrews G. Deep learning as a tool to better understand transcription factor binding across cell types and species. Ph.D. thesis, UMass Chan Medical School; 2024.
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389–403.
pubmed: 30971806
doi: 10.1038/s41576-019-0122-6
Zhang S, et al. Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data. Brief Bioinform. 2022;23:bbab374.
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet. 2023;24:125–37.
pubmed: 36192604
doi: 10.1038/s41576-022-00532-2
Singh G, et al. A exible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells.Genome Res. 2021;31:564–575.
Zheng A, et al. Deep neural networks identify sequence context features predictive of transcription factor binding. Nat Mach Intel. 2021;3:172–80.
doi: 10.1038/s42256-020-00282-y
Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016;26:990–9.
pubmed: 27197224
pmcid: 4937568
doi: 10.1101/gr.200535.115
Nair S, Kim DS, Perricone J, Kundaje A. Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts. Bioinformatics. 2019;35:i108–16.
pubmed: 31510655
pmcid: 6612838
doi: 10.1093/bioinformatics/btz352
Balandat M, et al. BoTorch: programmable bayesian optimization in PyTorch. 2019. arxiv e-prints arXiv–1910 .
Maekawa T, et al. Social isolation stress induces ATF-7 phosphorylation and impairs silencing of the 5-HT 5B receptor gene. EMBO J. 2010;29:196–208.
pubmed: 19893493
doi: 10.1038/emboj.2009.318
Chen M, et al. Emerging roles of activating transcription factor (ATF) family members in tumourigenesis and immunity: Implications in cancer immunotherapy. Genes Dis. 2021;9(4):981–99.
Gozdecka M, Breitwieser W. The roles of ATF2 (activating transcription factor 2) in tumorigenesis. Biochem Soc Trans. 2012;40:230–4.
pubmed: 22260696
doi: 10.1042/BST20110630
Meijer BJ, et al. ATF2 and ATF7 are critical mediators of intestinal epithelial repair. Cell Mol Gastroenterol Hepatol. 2020;10:23–42.
pubmed: 31958521
pmcid: 7210476
doi: 10.1016/j.jcmgh.2020.01.005
Kim S, Yu N-K, Kaang B-K. CTCF as a multifunctional protein in genome regulation and gene expression. Exp Mol Med. 2015;47:e166–e166.
pubmed: 26045254
pmcid: 4491725
doi: 10.1038/emm.2015.33
Chen H, Tian Y, Shu W, Bo X, Wang S. Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome. PLoS ONE. 2012;7:e41374.
pubmed: 22829947
pmcid: 3400636
doi: 10.1371/journal.pone.0041374
Holwerda SJB, de Laat W. CTCF: the protein, the binding partners, the binding sites and their chromatin loops. Philos Trans R Soc B Biol Sci. 2013;368:20120369.
doi: 10.1098/rstb.2012.0369
Li YE, et al. An atlas of gene regulatory elements in adult mouse cerebrum. Nature. 2021;598:129–36.
pubmed: 34616068
pmcid: 8494637
doi: 10.1038/s41586-021-03604-1
BRAIN Initiative Cell Census Network (BICCN). A multimodal cell census and atlas of the mammalian primary motor cortex. Nature. 2021;598:86–102.
Zu S, et al. Single-cell analysis of chromatin accessibility in the adult mouse brain. Nature. 2023;624:378–89.
pubmed: 38092917
pmcid: 10719105
doi: 10.1038/s41586-023-06824-9
Sams DS, et al. Neuronal CTCF is necessary for basal and experience-dependent gene regulation, memory formation, and genomic structure of BDNF and Arc. Cell Rep. 2016;17:2418–30.
pubmed: 27880914
doi: 10.1016/j.celrep.2016.11.004
Dang CV. MYC on the path to cancer. Cell. 2012;149:22–35.
pubmed: 22464321
pmcid: 3345192
doi: 10.1016/j.cell.2012.03.003
Davudian S, Mansoori B, Shajari N, Mohammadi A, Baradaran B. BACH1, the master regulator gene: a novel candidate target for cancer therapy. Gene. 2016;588:30–7.
pubmed: 27108804
doi: 10.1016/j.gene.2016.04.040
Guo X, Yang M, Gu H, Zhao J, Zou L. Decreased expression of SOX6 confers a poor prognosis in hepatocellular carcinoma. Cancer Epidemiol. 2013;37:732–6.
pubmed: 23731550
doi: 10.1016/j.canep.2013.05.002
Wysocka J, Reilly PT, Herr W. Loss of HCF-1-chromatin association precedes temperature-induced growth arrest of tsBN67 cells. Mol Cell Biol. 2001;21:3820–9.
pubmed: 11340173
pmcid: 87041
doi: 10.1128/MCB.21.11.3820-3829.2001
Maslova A, et al. Deep learning of immune cell differentiation. Proc Natl Acad Sci. 2020;117(25655):25666.
Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8:1–9.
doi: 10.1186/gb-2007-8-2-r24
De Graeve F, et al. Role of the ATFa/JNK2 complex in jun activation. Oncogene. 1999;18:3491–500.
pubmed: 10376527
doi: 10.1038/sj.onc.1202723
Fornes O, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–92.
pubmed: 31701148
Ambrosini G, et al. Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study. Genome Biol. 2020;21:1–18.
doi: 10.1186/s13059-020-01996-3
Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, Van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res. 2017;45:e119–e119.
pubmed: 28591841
pmcid: 5737723
doi: 10.1093/nar/gkx314
Zhou J, et al. MTTFsite: cross-cell type TF binding site prediction by using multi-task learning. Bioinformatics. 2019;35:5067–77.
pubmed: 31161194
pmcid: 6954652
doi: 10.1093/bioinformatics/btz451
Phuycharoen M, et al. Uncovering tissue-specific binding features from differential deep learning. Nucleic Acids Res. 2020;48:e27–e27.
pubmed: 31974574
pmcid: 7049686
doi: 10.1093/nar/gkaa009
Novakovsky G, Saraswat M, Fornes O, Mostafavi S, Wasserman WW. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol. 2021;22:1–25.
doi: 10.1186/s13059-021-02499-5
Pechenick DA, Payne JL, Moore JH. Phenotypic robustness and the assortativity signature of human transcription factor networks. PLoS Comput Biol. 2014;10:e1003780.
pubmed: 25121490
pmcid: 4133045
doi: 10.1371/journal.pcbi.1003780
Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–19.
pubmed: 22153082
pmcid: 3243364
doi: 10.1016/j.cell.2011.11.013
Kaya-Okur HS, et al. Cut &tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019;10:1930.
pubmed: 31036827
pmcid: 6488672
doi: 10.1038/s41467-019-09982-5
Wingender E, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 2000;28(316):319.
Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in encode TF binding experiments. Nucleic Acids Res. 2014;42:2976–87.
pubmed: 24335146
doi: 10.1093/nar/gkt1249
Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Res. 2015;43:W39–49.
pubmed: 25953851
pmcid: 4489269
doi: 10.1093/nar/gkv416