TFscope: systematic analysis of the sequence features involved in the binding preferences of transcription factors.
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
10 Jul 2024
10 Jul 2024
Historique:
received:
09
09
2022
accepted:
24
06
2024
medline:
11
7
2024
pubmed:
11
7
2024
entrez:
10
7
2024
Statut:
epublish
Résumé
Characterizing the binding preferences of transcription factors (TFs) in different cell types and conditions is key to understand how they orchestrate gene expression. Here, we develop TFscope, a machine learning approach that identifies sequence features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two conditions or two TFs with similar motifs (paralogous TFs). TFscope systematically investigates differences in the core motif, nucleotide environment and co-factor motifs, and provides the contribution of each key feature in the two experiments. TFscope was applied to > 305 ChIP-seq pairs, and several examples are discussed.
Identifiants
pubmed: 38987807
doi: 10.1186/s13059-024-03321-8
pii: 10.1186/s13059-024-03321-8
doi:
Substances chimiques
Transcription Factors
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
187Subventions
Organisme : Labex NUMEV (FR)
ID : MOTION project
Organisme : SIRIC Montpellier
ID : MOTION project
Organisme : Agence Nationale de la Recherche
ID : ANR-22-CE45-0031-01
Organisme : Laboratoire d'Excellence EpiGenMed
ID : R-loops project
Informations de copyright
© 2024. The Author(s).
Références
Afek A, Cohen H, Barber-Zucker S, Gordân R, Lukatsky DB. Nonconsensus protein binding to repetitive DNA sequence elements significantly affects eukaryotic genomes. PLoS Comput Biol. 2015;11(8):e1004429. https://doi.org/10.1371/journal.pcbi.1004429 . Public Library of Science.
Agarwal V, Shendure J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 2020;31(7):107663. https://doi.org/10.1016/j.celrep.2020.107663 .
doi: 10.1016/j.celrep.2020.107663
pubmed: 32433972
Ambrosini G, Vorontsov I, Penzar D, Groux R, Fornes O, Nikolaeva DD, et al. Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study. Genome Biol. 2020;21(1):114. https://doi.org/10.1186/s13059-020-01996-3 .
doi: 10.1186/s13059-020-01996-3
pubmed: 32393327
pmcid: 7212583
Arnosti DN, Kulkarni MM. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J Cell Biochem. 2005;94(5):890–8. https://doi.org/10.1002/jcb.20352 .
doi: 10.1002/jcb.20352
pubmed: 15696541
Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet. 2021;53(3):354–66. https://doi.org/10.1038/s41588-021-00782-6 .
doi: 10.1038/s41588-021-00782-6
pubmed: 33603233
pmcid: 8812996
Bailey TL. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 2021;(btab203). https://doi.org/10.1093/bioinformatics/btab203 .
Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36.
pubmed: 7584402
Bailey TL, Machanick P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 2012;40(17):e128. https://doi.org/10.1093/nar/gks433 .
doi: 10.1093/nar/gks433
pubmed: 22610855
pmcid: 3458523
Bejjani F, Evanno E, Zibara K, Piechaczyk M, Jariel-Encontre I. The AP-1 transcriptional complex: Local switch or remote command? Biochim Biophys Acta Rev Cancer. 2019;1872(1):11–23. Elsevier.
Bejjani F, Tolza C, Boulanger M, Downes D, Romero R, Maqbool MA, et al. Fra-1 regulates its target genes via binding to remote enhancers without exerting major control on chromatin architecture in triple negative breast cancers. Nucleic Acids Res. 2021;49(5):2488–508. Oxford University Press.
Bernardini A, Lorenzo M, Chaves-Sanjuan A, Swuec P, Pigni M, Saad D, et al. The USR domain of USF1 mediates NF-Y interactions and cooperative DNA binding. Int J Biol Macromol. 2021;193:401–13. https://doi.org/10.1016/j.ijbiomac.2021.10.056 .
doi: 10.1016/j.ijbiomac.2021.10.056
pubmed: 34673109
Castellanos M, Mothi N, Muñoz V. Eukaryotic transcription factors can track and control their target genes using DNA antennas. Nat Commun. 2020;11. https://doi.org/10.1038/s41467-019-14217-8 .
Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res. 2017;45(13):e119. https://doi.org/10.1093/nar/gkx314 .
doi: 10.1093/nar/gkx314
pubmed: 28591841
pmcid: 5737723
Chaudhari HG, Cohen BA. Local sequence features that influence AP-1 cis-regulatory activity. Genome Res. 2018;28(2):171–81. https://doi.org/10.1101/gr.226530.117 .
doi: 10.1101/gr.226530.117
pubmed: 29305491
pmcid: 5793781
Dror I, Golan T, Levy C, Rohs R, Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res. 2015;25(9):1268–80. https://doi.org/10.1101/gr.184671.114 .
doi: 10.1101/gr.184671.114
pubmed: 26160164
pmcid: 4561487
Eder T, Grebien F. Comprehensive assessment of differential ChIP-seq tools guides optimal algorithm selection. Genome Biol. 2022;23(1):119. https://doi.org/10.1186/s13059-022-02686-y .
doi: 10.1186/s13059-022-02686-y
pubmed: 35606795
pmcid: 9128273
ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science (NY). 2004;306(5696):636–40. https://doi.org/10.1126/science.1105136 .
doi: 10.1126/science.1105136
Ernst J, Kellis M. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 2013;23(7):1142–54. https://doi.org/10.1101/gr.144840.112 .
doi: 10.1101/gr.144840.112
pubmed: 23595227
pmcid: 3698507
Feldker N, Ferrazzi F, Schuhwerk H, Widholz SA, Guenther K, Frisch I, et al. Genome-wide cooperation of EMT transcription factor ZEB1 with YAP and AP-1 in breast cancer. EMBO J. 2020;39(17):e103209. https://doi.org/10.15252/embj.2019103209 .
Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48(D1):D87–92. https://doi.org/10.1093/nar/gkz1001 .
doi: 10.1093/nar/gkz1001
pubmed: 31701148
Gheorghe M, Sandve GK, Khan A, Chèneby J, Ballester B, Mathelier A. A map of direct TF-DNA interactions in the human genome. Nucleic Acids Res. 2019;47(4):e21. https://doi.org/10.1093/nar/gky1210 .
doi: 10.1093/nar/gky1210
pubmed: 30517703
Ghorbani A, Abid A, Zou J. Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33 no. 01. 2019. pp. 3681–3688. https://doi.org/10.1609/aaai.v33i01.33013681 .
Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. https://doi.org/10.1093/bioinformatics/btr064 .
doi: 10.1093/bioinformatics/btr064
pubmed: 21330290
pmcid: 3065696
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89. https://doi.org/10.1016/j.molcel.2010.05.004 .
doi: 10.1016/j.molcel.2010.05.004
pubmed: 20513432
pmcid: 2898526
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, et al. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science (NY). 2023;381(6664):eadd1250. https://doi.org/10.1126/science.add1250 .
Huminiecki Ł, Horbańczuk J. Can we predict gene expression by understanding proximal promoter architecture? Trends Biotechnol. 2017;0(0). https://doi.org/10.1016/j.tibtech.2017.03.007 .
Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, et al. DNA-binding specificities of human transcription factors. Cell. 2013;152(1–2):327–39. https://doi.org/10.1016/j.cell.2012.12.009 .
doi: 10.1016/j.cell.2012.12.009
pubmed: 23332764
Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature. 2015;527(7578):384–8. https://doi.org/10.1038/nature15518 .
doi: 10.1038/nature15518
pubmed: 26550823
Kadiyala V, Sasse SK, Altonsy MO, Berman R, Chu HW, Phang TL, et al. Cistrome-based cooperation between airway epithelial glucocorticoid receptor and NF-κB orchestrates anti-inflammatory effects. J Biol Chem. 2016;291(24):12673–87. ASBMB.
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 2018;28(5):739–50. https://doi.org/10.1101/gr.227819.117 .
doi: 10.1101/gr.227819.117
pubmed: 29588361
pmcid: 5932613
Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res. 2021;49(D1):D104–11. https://doi.org/10.1093/nar/gkaa1057 .
doi: 10.1093/nar/gkaa1057
pubmed: 33231677
Koo PK, Eddy SR. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput Biol. 2019;15(12):e1007560. https://doi.org/10.1371/journal.pcbi.1007560 .
doi: 10.1371/journal.pcbi.1007560
pubmed: 31856220
pmcid: 6941814
Kribelbauer JF, Rastogi C, Bussemaker HJ, Mann RS. Low-affinity binding sites and the transcription factor specificity paradox in eukaryotes. Annu Rev Cell Dev Biol. 2019;35(1):357–79. https://doi.org/10.1146/annurev-cellbio-100617-062719 .
doi: 10.1146/annurev-cellbio-100617-062719
pubmed: 31283382
pmcid: 6787930
Kulakovskiy IV, Vorontsov IE, Yevshin IS, Sharipov RN, Fedorova AD, Rumynskiy EI, et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2017. https://doi.org/10.1093/nar/gkx1106 .
doi: 10.1093/nar/gkx1106
pmcid: 5753240
Kulik M, Bothe M, Kibar G, Fuchs A, Schöne S, Prekovic S, et al. Androgen and glucocorticoid receptor direct distinct transcriptional programs by receptor-specific and shared DNA binding sites. Nucleic Acids Res. 2021;49(7):3856–75. Oxford University Press.
Levo M, Zalckvar E, Sharon E, Machado ACD, Kalma Y, Lotam-Pompan M, et al. Unraveling determinants of transcription factor binding outside the core binding site. Genome Res. 2015;25(7):1018–29. https://doi.org/10.1101/gr.185033.114 .
doi: 10.1101/gr.185033.114
pubmed: 25762553
pmcid: 4484385
Li M, Ma B, Wang L. Finding similar regions in many strings. In: Proceedings of the thirty-first annual ACM symposium on Theory of Computing, STOC ’99. New York: Association for Computing Machinery; 1999. pp. 473–482. https://doi.org/10.1145/301250.301376 .
Menichelli C, Guitard V, Martins RM, Lèbre S, Lopez-Rubio JJ, Lecellier CH, et al. Identification of long regulatory elements in the genome of Plasmodium falciparum and other eukaryotes. PLoS Comput Biol. 2021;17(4):e1008909. https://doi.org/10.1371/journal.pcbi.1008909 . Public Library of Science.
Mirny LA. Nucleosome-mediated cooperativity between transcription factors. Proc Natl Acad Sci U S A. 2010;107(52):22534–9. https://doi.org/10.1073/pnas.0913805107 .
doi: 10.1073/pnas.0913805107
pubmed: 21149679
pmcid: 3012490
Morgunova E, Taipale J. Structural perspective of cooperative transcription factor binding. Curr Opin Struct Biol. 2017;47:1–8. https://doi.org/10.1016/j.sbi.2017.03.006 .
doi: 10.1016/j.sbi.2017.03.006
pubmed: 28349863
Novakovsky G, Saraswat M, Fornes O, Mostafavi S, Wasserman WW. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol. 2021;22(1):280. https://doi.org/10.1186/s13059-021-02499-5 .
doi: 10.1186/s13059-021-02499-5
pubmed: 34579793
pmcid: 8474956
Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016;44(11):e107–e107. https://doi.org/10.1093/nar/gkw226 .
doi: 10.1093/nar/gkw226
pubmed: 27084946
pmcid: 4914104
Reiter F, Wienerroither S, Stark A. Combinatorial function of transcription factors and cofactors. Curr Opin Genet Dev. 2017;43:73–81. https://doi.org/10.1016/j.gde.2016.12.007 .
doi: 10.1016/j.gde.2016.12.007
pubmed: 28110180
Romero R, Menichelli C, Vroland C, Marin JM, Lèbre S, Lecellier C, et al. TFscope. Genome Biol. 2024. https://doi.org/10.5281/zenodo.12160588 .
doi: 10.5281/zenodo.12160588
Ruan S, Stormo GD. Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol. 2017;13(7):e1005638. https://doi.org/10.1371/journal.pcbi.1005638 . Public Library of Science.
Ruan S, Stormo GD. Comparison of discriminative motif optimization using matrix and DNA shape-based models. BMC Bioinformatics. 2018;19(1):86. https://doi.org/10.1186/s12859-018-2104-7 .
doi: 10.1186/s12859-018-2104-7
pubmed: 29510689
pmcid: 5840810
Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18(20):6097–100.
doi: 10.1093/nar/18.20.6097
pubmed: 2172928
pmcid: 332411
Severson TM, Kim Y, Joosten SE, Schuurman K, Van Der Groep P, Moelans CB, et al. Characterizing steroid hormone receptor chromatin binding landscapes in male and female breast cancer. Nat Commun. 2018;9(1):1–12. Nature Publishing Group.
Shen N, Zhao J, Schipper JL, Zhang Y, Bepler T, Leehr D, et al. Divergence in DNA specificity among paralogous transcription factors contributes to their differential in vivo binding. Cell Syst. 2018;6(4):470–483.e8. https://doi.org/10.1016/j.cels.2018.02.009 .
doi: 10.1016/j.cels.2018.02.009
pubmed: 29605182
pmcid: 6008103
Sherwood RI, Hashimoto T, O’Donnell CW, Lewis S, Barkal AA, van Hoff JP, et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol. 2014;32(2):171–8. https://doi.org/10.1038/nbt.2798 .
doi: 10.1038/nbt.2798
pubmed: 24441470
pmcid: 3951735
Srivastava D, Mahony S. Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. Biochim Biophys Acta Gene Regul Mech. 2020;1863(6):194443. https://doi.org/10.1016/j.bbagrm.2019.194443 .
doi: 10.1016/j.bbagrm.2019.194443
pubmed: 31639474
Stark R, Brown G. DiffBind: differential binding analysis of ChIP-Seq peak data. Bioconductor version: Release (3.17); 2023. https://doi.org/10.18129/B9.bioc.DiffBind .
Szalóki N, Krieger JW, Komáromi I, Tóth K, Vámosi G. Evidence for homodimerization of the c-Fos transcription factor in live cells revealed by fluorescence microscopy and computer modeling. Mol Cell Biol. 2015;35(21):3785–98. https://doi.org/10.1128/MCB.00346-15 . Taylor & Francis.
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82. https://doi.org/10.1038/nature11232 .
doi: 10.1038/nature11232
pubmed: 22955617
pmcid: 3721348
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1994;58:267–88.
doi: 10.1111/j.2517-6161.1996.tb02080.x
Vandel J, Cassan O, Lèbre S, Lecellier CH, Bréhélin L. Probing transcription factor combinatorics in different promoter classes and in enhancers. BMC Genomics. 2019;20(1):103. https://doi.org/10.1186/s12864-018-5408-0 .
doi: 10.1186/s12864-018-5408-0
pubmed: 30709337
pmcid: 6359851
Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22(9):1798–812. https://doi.org/10.1101/gr.139105.112 . Cold Spring Harbor Lab.
Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. J Clin Neurosci. 2004;5(4):276–87. https://doi.org/10.1038/nrg1315 .
doi: 10.1038/nrg1315
Whitaker JW, Chen Z, Wang W. Predicting the human epigenome from DNA motifs. Nat Methods. 2015;12(3):265–72. https://doi.org/10.1038/nmeth.3065 .
doi: 10.1038/nmeth.3065
pubmed: 25240437
Worsley Hunt R, Mathelier A, del Peso L, Wasserman WW. Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment. BMC Genomics. 2014;15(1):472. https://doi.org/10.1186/1471-2164-15-472 .
doi: 10.1186/1471-2164-15-472
pubmed: 24927817
pmcid: 4082612
Wunderlich Z, Mirny LA. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 2009;25(10):434–40. https://doi.org/10.1016/j.tig.2009.08.003 .
doi: 10.1016/j.tig.2009.08.003
pubmed: 19815308
pmcid: 3697852
Zheng A, Lamkin M, Zhao H, Wu C, Su H, Gymrek M. Deep neural networks identify sequence context features predictive of transcription factor binding. Nat Mach Intel. 2021;3(2):172–80. https://doi.org/10.1038/s42256-020-00282-y .
doi: 10.1038/s42256-020-00282-y
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931–4. https://doi.org/10.1038/nmeth.3547 .
doi: 10.1038/nmeth.3547
pubmed: 26301843
pmcid: 4768299