Explainable artificial intelligence as a reliable annotator of archaeal promoter regions.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
31 01 2023
31 01 2023
Historique:
received:
08
11
2022
accepted:
20
01
2023
entrez:
31
1
2023
pubmed:
1
2
2023
medline:
3
2
2023
Statut:
epublish
Résumé
Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.
Identifiants
pubmed: 36720898
doi: 10.1038/s41598-023-28571-7
pii: 10.1038/s41598-023-28571-7
pmc: PMC9889792
doi:
Substances chimiques
Transcription Factors
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1763Informations de copyright
© 2023. The Author(s).
Références
PLoS One. 2013 Jul 09;8(7):e67863
pubmed: 23874456
Mol Ther Nucleic Acids. 2019 Sep 6;17:337-346
pubmed: 31299595
Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36
pubmed: 7584402
Sci Rep. 2018 Jan 22;8(1):1313
pubmed: 29358630
Annu Rev Genet. 2017 Nov 27;51:143-170
pubmed: 29178818
Nat Rev Microbiol. 2017 Nov 10;15(12):711-723
pubmed: 29123225
Bioinformatics. 2019 Sep 1;35(17):2957-2965
pubmed: 30649179
Annu Rev Biophys Biomol Struct. 2004;33:415-40
pubmed: 15139820
J Biosci. 2007 Aug;32(5):851-62
pubmed: 17914227
Bioinformatics. 2009 Jun 15;25(12):i313-20
pubmed: 19478005
Genome Biol. 2008 Apr 09;9(4):R70
pubmed: 18397532
Emerg Top Life Sci. 2018 Dec 14;2(4):517-533
pubmed: 33525828
Nucleic Acids Res. 2009 Jan;37(Database issue):D37-40
pubmed: 18805906
Biol Res. 2018 Oct 5;51(1):37
pubmed: 30290805
Mol Inform. 2022 Jun;41(6):e2100264
pubmed: 34989149
Sci Rep. 2018 Mar 14;8(1):4520
pubmed: 29540741
Nucleic Acids Res. 2022 May 11;:
pubmed: 35544234
JAMA Surg. 2019 Nov 1;154(11):1064-1065
pubmed: 31509185
BMC Med Inform Decis Mak. 2020 Nov 30;20(1):310
pubmed: 33256715
BMC Med. 2019 Oct 29;17(1):195
pubmed: 31665002
Comput Chem. 2001 Dec;26(1):51-6
pubmed: 11765852
Philos Trans R Soc Lond B Biol Sci. 2006 Jun 29;361(1470):1007-22
pubmed: 16754611
BMC Bioinformatics. 2022 May 10;23(1):171
pubmed: 35538405
FEBS Open Bio. 2017 Feb 16;7(3):324-334
pubmed: 28286728
BMC Bioinformatics. 2008 Jul 22;9:319
pubmed: 18647401
Transcription. 2017 May 27;8(3):162-168
pubmed: 28340330
Microbiologyopen. 2021 Oct;10(5):e1230
pubmed: 34713600
Mol Microbiol. 2007 Sep;65(6):1395-404
pubmed: 17697097
Chem Soc Rev. 2021 Jul 7;50(13):7779-7819
pubmed: 34036968
J Theor Biol. 2011 Oct 21;287:92-9
pubmed: 21827769
Microbiol Rev. 1987 Jun;51(2):221-71
pubmed: 2439888
J Bacteriol. 2016 Jun 27;198(14):1906-1917
pubmed: 27137495
Comput Struct Biotechnol J. 2020 Sep 10;18:2445-2452
pubmed: 33005306
J Mol Biol. 2021 May 28;433(11):166860
pubmed: 33539888
Comput Struct Biotechnol J. 2022 Apr 19;20:2112-2123
pubmed: 35832629
Archaea. 2002 Sep;1(2):75-86
pubmed: 15803645
Front Genet. 2019 Apr 05;10:286
pubmed: 31024615