DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
21 Apr 2024
Historique:
received: 13 09 2023
accepted: 11 04 2024
medline: 22 4 2024
pubmed: 22 4 2024
entrez: 21 4 2024
Statut: epublish

Résumé

Deep learning models (DLMs) have gained importance in predicting, detecting, translating, and classifying a diversity of inputs. In bioinformatics, DLMs have been used to predict protein structures, transcription factor-binding sites, and promoters. In this work, we propose a hybrid model to identify transcription factors (TFs) among prokaryotic and eukaryotic protein sequences, named Deep Regulation (DeepReg) model. Two architectures were used in the DL model: a convolutional neural network (CNN), and a bidirectional long-short-term memory (BiLSTM). DeepReg reached a precision of 0.99, a recall of 0.97, and an F1-score of 0.98. The quality of our predictions, the bias-variance trade-off approach, and the characterization of new TF predictions were evaluated and compared against those produced by DeepTFactor, as well as against experimental data from three model organisms. Predictions based on our DLM tended to exhibit less variance and bias than those from DeepTFactor, thus increasing reliability and decreasing overfitting.

Identifiants

pubmed: 38644393
doi: 10.1038/s41598-024-59487-5
pii: 10.1038/s41598-024-59487-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

9155

Subventions

Organisme : Consejo Nacional de Humanidades, Ciencias y Tecnologías
ID : 857463
Organisme : DGAPA-UNAM
ID : IN-220523

Informations de copyright

© 2024. The Author(s).

Références

Privalov, P. L. & Crane-Robinson, C. Forces maintaining the DNA double helix and its complexes with transcription factors. Prog. Biophys. Mol. Biol. 135, 30–48. https://doi.org/10.1016/j.pbiomolbio.2018.01.007 (2018).
doi: 10.1016/j.pbiomolbio.2018.01.007 pubmed: 29378224
Fulton, D. L. et al. TFCat: The curated catalog of mouse and human transcription factors. Genome Biol. 10, R29. https://doi.org/10.1186/gb-2009-10-3-r29 (2009).
doi: 10.1186/gb-2009-10-3-r29 pubmed: 19284633 pmcid: 2691000
Lemon, B. & Tjian, R. Orchestrated response: A symphony of transcription factors for gene control. Genes Dev. 14, 2551–2569. https://doi.org/10.1101/gad.831000 (2000).
doi: 10.1101/gad.831000 pubmed: 11040209
Shelest, E. Transcription factors in fungi. FEMS Microbiol. Lett. 286, 145–151. https://doi.org/10.1111/j.1574-6968.2008.01293.x (2008).
doi: 10.1111/j.1574-6968.2008.01293.x pubmed: 18789126
Martinez-Liu, L. et al. Comparative genomics of DNA-binding transcription factors in archaeal and bacterial organisms. PLoS One 16, e0254025. https://doi.org/10.1371/journal.pone.0254025 (2021).
doi: 10.1371/journal.pone.0254025 pubmed: 34214112 pmcid: 8253408
Flores-Bautista, E. et al. Deciphering the functional diversity of DNA-binding transcription factors in bacteria and archaea organisms. PLoS One 15, e0237135. https://doi.org/10.1371/journal.pone.0237135 (2020).
doi: 10.1371/journal.pone.0237135 pubmed: 32822422 pmcid: 7446807
Ledesma, L., Hernandez-Guerrero, R. & Perez-Rueda, E. Prediction of DNA-binding transcription factors in bacteria and archaea genomes. In Prokaryotic Gene Regulation (eds Peeters, E. & Bervoets, I.) 103–112 (Springer US, 2022). https://doi.org/10.1007/978-1-0716-2413-5_7 .
doi: 10.1007/978-1-0716-2413-5_7
Kim, G. B., Gao, Y., Palsson, B. O. & Lee, S. Y. DeepTFactor: A deep learning-based tool for the prediction of transcription factors. Proc. Natl. Acad. Sci. https://doi.org/10.1073/pnas.2021171118 (2020).
doi: 10.1073/pnas.2021171118 pubmed: 33443222 pmcid: 7812747
Du, Z., Huang, T., Uversky, V. N. & Li, J. Predicting TF proteins by incorporating evolution information through PSSM. IEEE/ACM Trans. Comput. Biol. Bioinf. https://doi.org/10.1109/tcbb.2022.3199758 (2022).
doi: 10.1109/tcbb.2022.3199758
Wang, S., Cheng, X., Li, Y., Wu, M. & Zhao, Y. Image-based promoter prediction: A promoter prediction method based on evolutionarily generated patterns. Sci. Rep. https://doi.org/10.1038/s41598-018-36308-0 (2018).
doi: 10.1038/s41598-018-36308-0 pubmed: 30591712 pmcid: 6308232
Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. U. S. A. https://doi.org/10.1073/pnas.1821905116 (2019).
doi: 10.1073/pnas.1821905116 pubmed: 31843930 pmcid: 6936484
Zhao, J., Yan, W. & Yang, Y. Deeptp: A deep learning model for thermophilic protein prediction. Int. J. Mol. Sci. https://doi.org/10.3390/ijms24032217 (2023).
doi: 10.3390/ijms24032217 pubmed: 38203692 pmcid: 10779407
Oubounyt, M., Louadi, Z., Tayara, H. & Chong, K. T. DeePromoter: Robust promoter predictor using deep learning. Front. Genet. https://doi.org/10.3389/fgene.2019.00286 (2019).
doi: 10.3389/fgene.2019.00286 pubmed: 31024615 pmcid: 6460014
Shujaat, M., Wahab, A., Tayara, H. & Chong, K. T. pcPromoter-CNN: A CNN-based prediction and classification of promoters. Genes 11, 1529. https://doi.org/10.3390/genes11121529 (2020).
doi: 10.3390/genes11121529 pubmed: 33371507 pmcid: 7767505
Min, X., Ye, C., Liu, X. & Zeng, X. Predicting enhancer-promoter interactions by deep learning and matching heuristic. Brief. Bioinf. https://doi.org/10.1093/bib/bbaa254 (2020).
doi: 10.1093/bib/bbaa254
Quang, D. & Xie, X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107–e107. https://doi.org/10.1093/nar/gkw226 (2016).
doi: 10.1093/nar/gkw226 pubmed: 27084946 pmcid: 4914104
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838. https://doi.org/10.1038/nbt.3300 (2015).
doi: 10.1038/nbt.3300 pubmed: 26213851
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinf. https://doi.org/10.1093/bib/bbw068 (2016).
doi: 10.1093/bib/bbw068
Routhier, E. & Mozziconacci, J. Genomics enters the deep learning era. PeerJ 10, e13613. https://doi.org/10.7717/peerj.13613 (2022).
doi: 10.7717/peerj.13613 pubmed: 35769139 pmcid: 9235815
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. https://doi.org/10.1038/s41586-021-03819-2 (2021).
doi: 10.1038/s41586-021-03819-2 pubmed: 34265844 pmcid: 8371605
Apweiler, R. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32, 115D – 119. https://doi.org/10.1093/nar/gkh131 (2004).
doi: 10.1093/nar/gkh131
Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv https://doi.org/10.48550/ARXIV.1506.02142 (2015).
doi: 10.48550/ARXIV.1506.02142
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv https://doi.org/10.48550/ARXIV.1409.0473 (2014).
doi: 10.48550/ARXIV.1409.0473
Rodríguez-Martínez, J. A., Reinke, A. W., Bhimsaria, D., Keating, A. E. & Ansari, A. Z. Combinatorial bzip dimers display complex DNA-binding specificity landscapes. eLife https://doi.org/10.7554/elife.19272 (2017).
doi: 10.7554/elife.19272 pubmed: 28186491 pmcid: 5349851
Bobola, N. & Merabet, S. Homeodomain proteins in action: Similar DNA binding preferences, highly variable connectivity. Curr. Opin. Genet. Dev. 43, 1–8. https://doi.org/10.1016/j.gde.2016.09.008 (2017).
doi: 10.1016/j.gde.2016.09.008 pubmed: 27768937
Teixeira, M. C. et al. YEASTRACT+: A portal for the exploitation of global transcription regulation and metabolic model data in yeast biotechnology and pathogenesis. Nucleic Acids Res. 51, D785–D791 (2022).
doi: 10.1093/nar/gkac1041 pmcid: 9825512
Hu, Y. et al. Corrigendum: fmicb.2018.0271. Curation of transcriptional regulatory interactions in Aspergillus nidulans and Neurospora crassa reveal structural and evolutionary features of the regulatory networks. Front. Microbiol. 9, 2713. https://doi.org/10.3389/fmicb.2018.0271 (2018).
doi: 10.3389/fmicb.2018.0271 pubmed: 30455682 pmcid: 6236125
Ren, C., Zeng, L. & Zhou, M.-M. Preparation, biochemical analysis, and structure determination of the bromodomain, an acetyl-lysine binding domain. In Methods in Enzymology (eds Ren, C. et al.) 321–343 (Elsevier, 2016). https://doi.org/10.1016/bs.mie.2016.01.018 .
doi: 10.1016/bs.mie.2016.01.018
Watanabe, F. The role of charge neutralization and cooperative binding of linker histone in the higher-order structure of chromatin. FEBS Lett. 249, 147–150. https://doi.org/10.1016/0014-5793(89)80612-x (1989).
doi: 10.1016/0014-5793(89)80612-x pubmed: 2737276
Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58. https://doi.org/10.1162/neco.1992.4.1.1 (1992).
doi: 10.1162/neco.1992.4.1.1
Doroudi, S. The bias-variance tradeoff: How data science can inform educational debates. AERA Open 6, 233285842097720. https://doi.org/10.1177/2332858420977208 (2020).
doi: 10.1177/2332858420977208

Auteurs

Leonardo Ledesma-Dominguez (L)

Posgrado en Ciencia en Ingeniería de la Computación, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico. leonardoledd@ciencias.unam.mx.
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, 04510, Mexico City, México. leonardoledd@ciencias.unam.mx.

Erik Carbajal-Degante (E)

Coordinación de Universidad Abierta y Educación Digital (CUAED), Universidad Nacional Autónoma de México, 04510, Mexico City, México.

Gabriel Moreno-Hagelsieb (G)

Department of Biology, Wilfrid Laurier University, Waterloo, ON, Canada.

Ernesto Perez-Rueda (E)

Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Unidad Académica del Estado de Yucatán, Universidad Nacional Autónoma de México, Mérida, Yucatán, México. ernesto.perez@iimas.unam.mx.

Classifications MeSH