Improved BioCyc Operon Prediction: Revisiting the Operon Prediction Problem.


Journal

bioRxiv : the preprint server for biology
ISSN: 2692-8205
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
28 Jun 2024
Historique:
medline: 9 7 2024
pubmed: 9 7 2024
entrez: 9 7 2024
Statut: epublish

Résumé

Operon prediction is a valuable component of microbial-genome annotation because operon organization can yield inferences about gene function, and because knowledge of operon structure can aid the interpretation of gene expression data. We present a number of improvements to the existing Pathway Tools operon predictor based mostly on 7 new features that we hypothesized would increase its performance. The new features include shared Gene Ontology biological process terms, similarity of codon usage and GC content, correlated gene expression, and shared protein complex. We evaluated the proposed 7 new features and found that the addition of 6 of them improved the performance of the operon predictor from 79.55% to 83.49%, a decrease in error rate of 19.3%. When gene expression data was not included, the accuracy decreased to 82.547, still an improvement of 14.7%. One of the proposed features as well as a previously used feature had no effect and were removed. Although some of the new features had strong predictive value individually, when combined with the other features they did not have a large impact on predictive accuracy, suggesting that they were not independent from the other features.

Identifiants

pubmed: 38979392
doi: 10.1101/2024.06.23.600222
pmc: PMC11230180
pii:
doi:

Types de publication

Journal Article Preprint

Langues

eng

Auteurs

Classifications MeSH