Structure-aware machine learning strategies for antimicrobial peptide discovery.

AlphaFold2 Explainable machine learning Oversampling Peptide design Protein structure prediction Structural bias

Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
25 May 2024
Historique:
received: 08 02 2024
accepted: 16 05 2024
medline: 26 5 2024
pubmed: 26 5 2024
entrez: 25 5 2024
Statut: epublish

Résumé

Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.

Identifiants

pubmed: 38796582
doi: 10.1038/s41598-024-62419-y
pii: 10.1038/s41598-024-62419-y
doi:

Substances chimiques

Antimicrobial Peptides 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

11995

Subventions

Organisme : Mexican research council - Consejo Nacional de Humanidades Ciencias y Tecnologías (CONAHCYT)
ID : A1-S-32579
Organisme : Rosenkranz Medical Research Award 2021
ID : Biotechnology category

Informations de copyright

© 2024. The Author(s).

Références

de Oliveira, E. C. L., da Costa, K. S., Taube, P. S., Lima, A. H. & Junior, C. de S. de S. Biological membrane-penetrating peptides: computational prediction and applications. Front. Cell. Infect. Microbiol. 12, (2022).
Ali, F., Kumar, H., Alghamdi, W., Kateb, F. A. & Alarfaj, F. K. Recent advances in machine learning-based models for prediction of antiviral peptides. Arch. Comput. Methods Eng. 30, 4033–4044 (2023).
doi: 10.1007/s11831-023-09933-w
Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence. Commun. Biol. 4, 1–13 (2021).
doi: 10.1038/s42003-021-02586-0
Aguilera-Puga, M. d. C., Cancelarich, N. L., Marani, M. M., De La Fuente-Nunez, C. & Plisson, F. Accelerating the discovery and design of antimicrobial peptides with artificial intelligence. In Computational Drug Discovery and Design (Springer, 2023).
Grisoni, F. et al. Designing anticancer peptides by constructive machine learning. ChemMedChem 13, 1300–1302 (2018).
pubmed: 29679519 doi: 10.1002/cmdc.201800204
Hwang, J. S. et al. Development of anticancer peptides using artificial intelligence and combinational therapy for cancer therapeutics. Pharmaceutics 14, 997 (2022).
pubmed: 35631583 pmcid: 9147327 doi: 10.3390/pharmaceutics14050997
Zakharova, E., Orsi, M., Capecchi, A. & Reymond, J.-L. Machine learning guided discovery of non-hemolytic membrane disruptive anticancer peptides. ChemMedChem 17, e202200291 (2022).
pubmed: 35880810 pmcid: 9541320 doi: 10.1002/cmdc.202200291
Martinez-Hernandez, C., Del Carmen Aguilera-Puga, M. & Plisson, F. Deconstructing the potency and cell-line selectivity of membranolytic anticancer peptides. ChemBioChem 24, e202300058 (2023).
pubmed: 36988008 doi: 10.1002/cbic.202300058
Guo, Z. & Yamaguchi, R. Machine learning methods for protein-protein binding affinity prediction in protein design. Front. Bioinf. 2, 1065703 (2022).
doi: 10.3389/fbinf.2022.1065703
Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, (2018).
Bennett, N. R. et al. Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625 (2023).
pubmed: 37149653 pmcid: 10163288 doi: 10.1038/s41467-023-38328-5
Akbar, R. et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. mAbs 14, 2031482 (2022).
Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P. M. Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol. Sci. 44, 175–189 (2023).
pubmed: 36669976 doi: 10.1016/j.tips.2022.12.005
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).
pubmed: 31308553 doi: 10.1038/s41592-019-0496-6
Clifton, B. E., Kozome, D. & Laurino, P. Efficient exploration of sequence space by sequence-guided protein engineering and design. Biochemistry 62, 210–220 (2023).
pubmed: 35245020 doi: 10.1021/acs.biochem.1c00757
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
pubmed: 36702895 pmcid: 10400306 doi: 10.1038/s41587-022-01618-2
Mazurenko, S., Prokop, Z. & Damborsky, J. Machine learning in enzyme engineering. ACS Catal. 10, 1210–1223 (2020).
doi: 10.1021/acscatal.9b04321
Feehan, R., Montezano, D. & Slusky, J. S. G. Machine learning for enzyme engineering, selection and design. Protein Eng. Des. Sel. 34, gzab019 (2021).
Fjell, C. D. et al. Identification of novel antibacterial peptides by chemoinformatics and machine learning. J. Med. Chem. 52, 2006–2015 (2009).
pubmed: 19296598 doi: 10.1021/jm8015365
Fjell, C. D., Hiss, J. A., Hancock, R. E. W. & Schneider, G. Designing antimicrobial peptides: Form follows function. Nat. Rev. Drug Discov. 11, 37–51 (2012).
doi: 10.1038/nrd3591
Yoshida, M. et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4, 533–543 (2018).
doi: 10.1016/j.chempr.2018.01.005
Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: Are we generating effective drug candidates?. Front. Microbiol. 10, 1–15 (2020).
doi: 10.3389/fmicb.2019.03097
Xu, J. et al. Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides. Brief. Bioinform. 22, bbab083 (2021).
Wang, G., Vaisman, I. I. & van Hoek, M. L. Machine learning prediction of antimicrobial peptides. In Computational Peptide Science (ed. Simonson, T.) vol. 2405 1–37 (Springer US, New York, NY, 2022).
Fernandes, F. C. et al. Geometric deep learning as a potential tool for antimicrobial peptide prediction. Front. Bioinf. 3, 1216362 (2023).
doi: 10.3389/fbinf.2023.1216362
Hancock, R. E. W., Haney, E. F. & Gill, E. E. The immunology of host defence peptides: Beyond antimicrobial activity. Nat. Rev. Immunol. 16, 321–334 (2016).
pubmed: 27087664 doi: 10.1038/nri.2016.29
Haney, E. F., Straus, S. K. & Hancock, R. E. W. Reassessing the host defense peptide landscape. Front. Chem. 7, 43 (2019).
pubmed: 30778385 pmcid: 6369191 doi: 10.3389/fchem.2019.00043
Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: Functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).
pubmed: 32107480 doi: 10.1038/s41573-019-0058-8
Aldas-Bulos, V. D. & Plisson, F. Benchmarking protein structure predictors to assist machine learning-guided peptide discovery. Digit. Discov. 2, 981–993 (2023).
doi: 10.1039/D3DD00045A
Hancock, R. E. W. & Sahl, H. G. Antimicrobial and host-defense peptides as new anti-infective therapeutic strategies. Nat. Biotechnol. 24, 1551–1557 (2006).
pubmed: 17160061 doi: 10.1038/nbt1267
Zasloff, M. Mysteries that still remain. Biochim. Biophys. Acta BBA Biomembr. 1788, 1693–1694 (2009).
Torrent, M., Andreu, D., Nogués, V. M. & Boix, E. Connecting peptide physicochemical and antimicrobial properties by a rational prediction model. PLoS ONE 6, e16968 (2011).
pubmed: 21347392 pmcid: 3036733 doi: 10.1371/journal.pone.0016968
Torrent, M., Valle, J., Nogués, M. V., Boix, E. & Andreu, D. The Generation of antimicrobial peptide activity: A trade-off between charge and aggregation?. Angew. Chem. Int. Ed. 50, 10686–10689 (2011).
doi: 10.1002/anie.201103589
Lee, E. Y., Fulan, B. M., Wong, G. C. L. & Ferguson, A. L. Mapping membrane activity in undiscovered peptide sequence space using machine learning. Proc. Natl. Acad. Sci. 113, 13588–13593 (2016).
pubmed: 27849600 pmcid: 5137689 doi: 10.1073/pnas.1609893113
Brand, G. D., Ramada, M. H. S., Genaro-Mattos, T. C. & Bloch, C. Towards an experimental classification system for membrane active peptides. Sci. Rep. 8, 1194 (2018).
pubmed: 29352252 pmcid: 5775428 doi: 10.1038/s41598-018-19566-w
Brogden, K. A. Antimicrobial peptides: Pore formers or metabolic inhibitors in bacteria?. Nat. Rev. Microbiol. 3, 238–250 (2005).
pubmed: 15703760 doi: 10.1038/nrmicro1098
Sengupta, D., Leontiadou, H., Mark, A. E. & Marrink, S.-J. Toroidal pores formed by antimicrobial peptides show significant disorder. Biochim. Biophys. Acta BBA—Biomembr. 1778, 2308–2317 (2008).
Wimley, W. C. Describing the mechanism of antimicrobial peptide action with the interfacial activity model. ACS Chem. Biol. 5, 905–917 (2010).
pubmed: 20698568 pmcid: 2955829 doi: 10.1021/cb1001558
Hollmann, A., Martinez, M., Maturana, P., Semorile, L. C. & Maffia, P. C. Antimicrobial peptides: Interaction with model and biological membranes and synergism with chemical antibiotics. Front. Chem. 6, 204 (2018).
pubmed: 29922648 pmcid: 5996110 doi: 10.3389/fchem.2018.00204
Juhl, D. W., Glattard, E., Aisenbrey, C. & Bechinger, B. Antimicrobial peptides: Mechanism of action and lipid-mediated synergistic interactions within membranes. Faraday Discuss. 232, 419–434 (2021).
pubmed: 34533138 doi: 10.1039/D0FD00041H
Pirtskhalava, M. et al. DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).
pubmed: 33151284 doi: 10.1093/nar/gkaa991
Wang, G., Li, X. & Wang, Z. APD3: The antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
pubmed: 26602694 doi: 10.1093/nar/gkv1278
Armstrong, D. R. et al. PDBe: Improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 48, D335–D343 (2020).
pubmed: 31691821
Agrawal, P. et al. CPPsite 2.0: A repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 44, D1098–D1103 (2016).
Chen, Z. et al. iFeature: A Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34, 2499–2502 (2018).
pubmed: 29528364 pmcid: 6658705 doi: 10.1093/bioinformatics/bty140
Osorio, D., Rondón-Villarreal, P. & Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 7, 4 (2015).
doi: 10.32614/RJ-2015-001
Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).
pubmed: 28472272 doi: 10.1093/bioinformatics/btx285
Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2, 370–374 (2010).
doi: 10.1002/wics.84
Plisson, F., Ramírez-Sánchez, O. & Martínez-Hernández, C. Machine learning-guided discovery and design of non-hemolytic peptides. Sci. Rep. 10, 16581 (2020).
pubmed: 33024236 pmcid: 7538962 doi: 10.1038/s41598-020-73644-6
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. (2020).
RStudio Team. RStudio: Integrated development for R. RStudio, PBC, Boston, MA (2020).
Breiman, L. Random forests. Mach. Learn. https://doi.org/10.1023/A:1010933404324 (2001).
doi: 10.1023/A:1010933404324
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, (2001).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
doi: 10.1006/jcss.1997.1504
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936).
doi: 10.1111/j.1469-1809.1936.tb02137.x
Cramer, J. S. The origins of logistic regression. SSRN Electron. J. https://doi.org/10.2139/ssrn.360300 (2003).
doi: 10.2139/ssrn.360300
Breiman, L., Friedman, J. H., Stone, C. J. & Olshen, R. A. Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series Wadsworth statistics/probability series (Taylor & Francis, 1984). https://doi.org/10.1201/9781315139470 .
Cunningham, P. & Delany, S. J. k-nearest neighbour classifiers—a tutorial. ACM Comput. Surv. 54, 1–25 (2022).
doi: 10.1145/3459665
Current Trends in Knowledge Acquisition. (IOS Press, Amsterdam, 1990).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
doi: 10.1007/BF00994018
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
doi: 10.1007/s10994-006-6226-1
Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval. (Cambridge University Press, 2008). https://doi.org/10.1017/CBO9780511809071 .
Bentley, J. L. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975).
doi: 10.1145/361002.361007
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
doi: 10.1613/jair.953
Lunardon, N., Menardi, G. & Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 6, 79 (2014).
doi: 10.32614/RJ-2014-008
He, H., Bai, Y., Garcia, E. A., & Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1322–1328 (IEEE, Hong Kong, China, 2008). https://doi.org/10.1109/IJCNN.2008.4633969 .
Mirdita, M. et al. ColabFold: Making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
pubmed: 35637307 pmcid: 9184281 doi: 10.1038/s41592-022-01488-1
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
pubmed: 34265844 pmcid: 8371605 doi: 10.1038/s41586-021-03819-2
Heinig, M. & Frishman, D. STRIDE: A web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32, W500–W502 (2004).
pubmed: 15215436 pmcid: 441567 doi: 10.1093/nar/gkh429
Hamilton, N. E. & Ferry, M. ggtern: Ternary Diagrams Using ggplot2. J. Stat. Softw. 87, (2018).
Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? https://doi.org/10.48550/ARXIV.2207.08815 (2022).
Dean, S. N., Alvarez, J. A. E., Zabetakis, D., Walper, S. A. & Malanoski, A. P. PepVAE: Variational autoencoder framework for antimicrobial peptide generation and activity prediction. Front. Microbiol. 12, 725727 (2021).
pubmed: 34659152 pmcid: 8515052 doi: 10.3389/fmicb.2021.725727
Grafskaia, E. N. et al. Non-toxic antimicrobial peptide Hm-AMP2 from leech metagenome proteins identified by the gradient-boosting approach. Mater. Des. 224, 111364 (2022).
doi: 10.1016/j.matdes.2022.111364
Sequeira, A. M., Lousa, D. & Rocha, M. ProPythia: A Python package for protein classification based on machine and deep learning. Neurocomputing 484, 172–182 (2022).
doi: 10.1016/j.neucom.2021.07.102
Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. I. AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).
pubmed: 29374199 pmcid: 5785966 doi: 10.1038/s41598-018-19752-w
Lee, E. Y., Lee, M. W., Fulan, B. M., Ferguson, A. L. & Wong, G. C. L. What can machine learning do for antimicrobial peptides, and what can antimicrobial peptides do for machine learning?. Interface Focus 7, 20160153 (2017).
pubmed: 29147555 pmcid: 5665795 doi: 10.1098/rsfs.2016.0153

Auteurs

Mariana D C Aguilera-Puga (MDC)

Department of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, 36824, Irapuato, Guanajuato, Mexico.

Fabien Plisson (F)

Department of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, 36824, Irapuato, Guanajuato, Mexico. fabien.plisson@cinvestav.mx.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
alpha-Synuclein Humans Animals Mice Lewy Body Disease
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted

Classifications MeSH