Classification of bioactive peptides: A systematic benchmark of models and encodings.

Bioactive peptide Functional classification Machine learning Sequence encoding Systematic evaluation

Journal

Computational and structural biotechnology journal
ISSN: 2001-0370
Titre abrégé: Comput Struct Biotechnol J
Pays: Netherlands
ID NLM: 101585369

Informations de publication

Date de publication:
Dec 2024
Historique:
received: 19 03 2024
revised: 10 05 2024
accepted: 22 05 2024
medline: 13 6 2024
pubmed: 13 6 2024
entrez: 13 6 2024
Statut: epublish

Résumé

Bioactive peptides are short amino acid chains possessing biological activity and exerting physiological effects relevant to human health. Despite their therapeutic value, their identification remains a major problem, as it mainly relies on time-consuming in vitro tests. While bioinformatic tools for the identification of bioactive peptides are available, they are focused on specific functional classes and have not been systematically tested on realistic settings. To tackle this problem, bioactive peptide sequences and functions were here gathered from a variety of databases to generate a unified collection of bioactive peptides from microbial fermentation. This collection was organized into nine functional classes including some previously studied and some unexplored such as immunomodulatory, opioid and cardiovascular peptides. Upon assessing their sequence properties, four alternative encoding methods were tested in combination with a multitude of machine learning algorithms, from basic classifiers like logistic regression to advanced algorithms like BERT. Tests on a total of 171 models showed that, while some functions are intrinsically easier to detect, no single combination of classifiers and encoders worked universally well for all classes. For this reason, we unified all the best individual models for each class and generated CICERON (Classification of bIoaCtive pEptides fRom micrObial fermeNtation), a classification tool for the functional classification of peptides. State-of-the-art classifiers were found to underperform on our realistic benchmark dataset compared to the models included in CICERON. Altogether, our work provides a tool for real-world peptide classification and can serve as a benchmark for future model development.

Identifiants

pubmed: 38867723
doi: 10.1016/j.csbj.2024.05.040
pii: S2001-0370(24)00186-7
pmc: PMC11168199
doi:

Types de publication

Journal Article

Langues

eng

Pagination

2442-2452

Informations de copyright

© 2024 The Authors.

Déclaration de conflit d'intérêts

The authors declare no conflict of interest.

Auteurs

Edoardo Bizzotto (E)

Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy.

Guido Zampieri (G)

Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy.

Laura Treu (L)

Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy.

Pasquale Filannino (P)

Department of Soil, Plant and Food Science, University of Bari Aldo Moro, Via G. Amendola 165/a, Bari 70126, Italy.

Raffaella Di Cagno (R)

Faculty of Agricultural, Environmental and Food Sciences, Free University of Bolzano, Piazza Universita, 5, Bolzano 39100, Italy.

Stefano Campanaro (S)

Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy.

Classifications MeSH