Classification of bioactive peptides: A systematic benchmark of models and encodings.
Bioactive peptide
Functional classification
Machine learning
Sequence encoding
Systematic evaluation
Journal
Computational and structural biotechnology journal
ISSN: 2001-0370
Titre abrégé: Comput Struct Biotechnol J
Pays: Netherlands
ID NLM: 101585369
Informations de publication
Date de publication:
Dec 2024
Dec 2024
Historique:
received:
19
03
2024
revised:
10
05
2024
accepted:
22
05
2024
medline:
13
6
2024
pubmed:
13
6
2024
entrez:
13
6
2024
Statut:
epublish
Résumé
Bioactive peptides are short amino acid chains possessing biological activity and exerting physiological effects relevant to human health. Despite their therapeutic value, their identification remains a major problem, as it mainly relies on time-consuming in vitro tests. While bioinformatic tools for the identification of bioactive peptides are available, they are focused on specific functional classes and have not been systematically tested on realistic settings. To tackle this problem, bioactive peptide sequences and functions were here gathered from a variety of databases to generate a unified collection of bioactive peptides from microbial fermentation. This collection was organized into nine functional classes including some previously studied and some unexplored such as immunomodulatory, opioid and cardiovascular peptides. Upon assessing their sequence properties, four alternative encoding methods were tested in combination with a multitude of machine learning algorithms, from basic classifiers like logistic regression to advanced algorithms like BERT. Tests on a total of 171 models showed that, while some functions are intrinsically easier to detect, no single combination of classifiers and encoders worked universally well for all classes. For this reason, we unified all the best individual models for each class and generated CICERON (Classification of bIoaCtive pEptides fRom micrObial fermeNtation), a classification tool for the functional classification of peptides. State-of-the-art classifiers were found to underperform on our realistic benchmark dataset compared to the models included in CICERON. Altogether, our work provides a tool for real-world peptide classification and can serve as a benchmark for future model development.
Identifiants
pubmed: 38867723
doi: 10.1016/j.csbj.2024.05.040
pii: S2001-0370(24)00186-7
pmc: PMC11168199
doi:
Types de publication
Journal Article
Langues
eng
Pagination
2442-2452Informations de copyright
© 2024 The Authors.
Déclaration de conflit d'intérêts
The authors declare no conflict of interest.