Explainable artificial intelligence as a reliable annotator of archaeal promoter regions.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
31 01 2023
Historique:
received: 08 11 2022
accepted: 20 01 2023
entrez: 31 1 2023
pubmed: 1 2 2023
medline: 3 2 2023
Statut: epublish

Résumé

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.

Identifiants

pubmed: 36720898
doi: 10.1038/s41598-023-28571-7
pii: 10.1038/s41598-023-28571-7
pmc: PMC9889792
doi:

Substances chimiques

Transcription Factors 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1763

Informations de copyright

© 2023. The Author(s).

Références

PLoS One. 2013 Jul 09;8(7):e67863
pubmed: 23874456
Mol Ther Nucleic Acids. 2019 Sep 6;17:337-346
pubmed: 31299595
Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36
pubmed: 7584402
Sci Rep. 2018 Jan 22;8(1):1313
pubmed: 29358630
Annu Rev Genet. 2017 Nov 27;51:143-170
pubmed: 29178818
Nat Rev Microbiol. 2017 Nov 10;15(12):711-723
pubmed: 29123225
Bioinformatics. 2019 Sep 1;35(17):2957-2965
pubmed: 30649179
Annu Rev Biophys Biomol Struct. 2004;33:415-40
pubmed: 15139820
J Biosci. 2007 Aug;32(5):851-62
pubmed: 17914227
Bioinformatics. 2009 Jun 15;25(12):i313-20
pubmed: 19478005
Genome Biol. 2008 Apr 09;9(4):R70
pubmed: 18397532
Emerg Top Life Sci. 2018 Dec 14;2(4):517-533
pubmed: 33525828
Nucleic Acids Res. 2009 Jan;37(Database issue):D37-40
pubmed: 18805906
Biol Res. 2018 Oct 5;51(1):37
pubmed: 30290805
Mol Inform. 2022 Jun;41(6):e2100264
pubmed: 34989149
Sci Rep. 2018 Mar 14;8(1):4520
pubmed: 29540741
Nucleic Acids Res. 2022 May 11;:
pubmed: 35544234
JAMA Surg. 2019 Nov 1;154(11):1064-1065
pubmed: 31509185
BMC Med Inform Decis Mak. 2020 Nov 30;20(1):310
pubmed: 33256715
BMC Med. 2019 Oct 29;17(1):195
pubmed: 31665002
Comput Chem. 2001 Dec;26(1):51-6
pubmed: 11765852
Philos Trans R Soc Lond B Biol Sci. 2006 Jun 29;361(1470):1007-22
pubmed: 16754611
BMC Bioinformatics. 2022 May 10;23(1):171
pubmed: 35538405
FEBS Open Bio. 2017 Feb 16;7(3):324-334
pubmed: 28286728
BMC Bioinformatics. 2008 Jul 22;9:319
pubmed: 18647401
Transcription. 2017 May 27;8(3):162-168
pubmed: 28340330
Microbiologyopen. 2021 Oct;10(5):e1230
pubmed: 34713600
Mol Microbiol. 2007 Sep;65(6):1395-404
pubmed: 17697097
Chem Soc Rev. 2021 Jul 7;50(13):7779-7819
pubmed: 34036968
J Theor Biol. 2011 Oct 21;287:92-9
pubmed: 21827769
Microbiol Rev. 1987 Jun;51(2):221-71
pubmed: 2439888
J Bacteriol. 2016 Jun 27;198(14):1906-1917
pubmed: 27137495
Comput Struct Biotechnol J. 2020 Sep 10;18:2445-2452
pubmed: 33005306
J Mol Biol. 2021 May 28;433(11):166860
pubmed: 33539888
Comput Struct Biotechnol J. 2022 Apr 19;20:2112-2123
pubmed: 35832629
Archaea. 2002 Sep;1(2):75-86
pubmed: 15803645
Front Genet. 2019 Apr 05;10:286
pubmed: 31024615

Auteurs

Gustavo Sganzerla Martinez (G)

Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, RS, Brazil.

Ernesto Perez-Rueda (E)

Unidad Académica de Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Yucatán, Mérida, Mexico.

Aditya Kumar (A)

Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, 784028, India.

Sharmilee Sarkar (S)

Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, 784028, India.

Scheila de Avila E Silva (S)

Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, RS, Brazil. sasilva6@ucs.br.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Artificial Intelligence COVID-19 SARS-CoV-2 Pandemics

Classifications MeSH