BaPreS: a software tool for predicting bacteriocins using an optimal set of features.
Antibiotic resistance
Bacteriocin prediction
Deep learning
Feature selection
Machine learning
Sequence matching
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
17 Aug 2023
17 Aug 2023
Historique:
received:
27
10
2022
accepted:
09
05
2023
medline:
21
8
2023
pubmed:
18
8
2023
entrez:
17
8
2023
Statut:
epublish
Résumé
Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to cases of both broad and narrow killing spectra. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to their high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) using an optimal set of features for detecting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and utilized the best machine learning model to implement the software tool. We applied BaPreS to an established dataset and evaluated its prediction performance. Acquired results show that the software tool can achieve a prediction accuracy of 95.54% for testing protein sequences. This tool allows users to add new bacteriocin or non-bacteriocin sequences in the training dataset to further enhance the predictive power of the tool. We compared the prediction performance of the BaPreS with a popular sequence matching-based tool and a deep learning-based method, and our software tool outperformed both. BaPreS is a bacteriocin prediction tool that can be used to discover new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. This software tool can be used with Windows, Linux and macOS operating systems. The open-source software package and its user manual are available at https://github.com/suraiya14/BaPreS .
Sections du résumé
BACKGROUND
BACKGROUND
Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to cases of both broad and narrow killing spectra. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to their high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) using an optimal set of features for detecting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and utilized the best machine learning model to implement the software tool.
RESULTS
RESULTS
We applied BaPreS to an established dataset and evaluated its prediction performance. Acquired results show that the software tool can achieve a prediction accuracy of 95.54% for testing protein sequences. This tool allows users to add new bacteriocin or non-bacteriocin sequences in the training dataset to further enhance the predictive power of the tool. We compared the prediction performance of the BaPreS with a popular sequence matching-based tool and a deep learning-based method, and our software tool outperformed both.
CONCLUSIONS
CONCLUSIONS
BaPreS is a bacteriocin prediction tool that can be used to discover new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. This software tool can be used with Windows, Linux and macOS operating systems. The open-source software package and its user manual are available at https://github.com/suraiya14/BaPreS .
Identifiants
pubmed: 37592230
doi: 10.1186/s12859-023-05330-z
pii: 10.1186/s12859-023-05330-z
pmc: PMC10433575
doi:
Substances chimiques
Bacteriocins
0
Anti-Bacterial Agents
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
313Informations de copyright
© 2023. BioMed Central Ltd., part of Springer Nature.
Références
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W29-33
pubmed: 23609542
Micron. 1996 Dec;27(6):467-79
pubmed: 9168627
Bioinformatics. 2005 Jan 1;21(1):10-9
pubmed: 15308540
BMC Microbiol. 2010 Jan 27;10:22
pubmed: 20105292
BMC Genomics. 2020 Jan 2;21(1):6
pubmed: 31898477
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W448-53
pubmed: 23677608
Front Microbiol. 2018 Jul 26;9:1654
pubmed: 30093889
J Biol Chem. 2004 May 28;279(22):23262-6
pubmed: 15039428
Drug Dev Res. 2020 Feb;81(1):43-51
pubmed: 31483516
Genes (Basel). 2021 Jan 21;12(2):
pubmed: 33494403
Brief Bioinform. 2011 Jan;12(1):86-9
pubmed: 20360022
BMC Bioinformatics. 2015 Nov 11;16:381
pubmed: 26558535
Proteins. 2001 May 15;43(3):246-55
pubmed: 11288174
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9
pubmed: 18440982
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
BioData Min. 2021 Feb 4;14(1):13
pubmed: 33541410
Bioinformatics. 2019 Jun 1;35(12):2009-2016
pubmed: 30418485
J Clin Lab Anal. 2022 Jan;36(1):e24093
pubmed: 34851542
PLoS One. 2018 May 9;13(5):e0197041
pubmed: 29742157
J Adv Res. 2019 Apr 23;19:75-84
pubmed: 31341672
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3663-3672
pubmed: 34699364
Annu Rev Microbiol. 2002;56:117-37
pubmed: 12142491
J Appl Microbiol. 2019 Dec;127(6):1656-1664
pubmed: 31419358
Sci Rep. 2020 Nov 6;10(1):19260
pubmed: 33159146
Proc Natl Acad Sci U S A. 1995 Sep 12;92(19):8700-4
pubmed: 7568000
Bioinformatics. 2015 Jun 1;31(11):1857-9
pubmed: 25619996
Biochem Pharmacol. 2017 Jun 15;134:74-86
pubmed: 27940263
Trends Microbiol. 2015 Oct;23(10):587-590
pubmed: 26433692