Structure-aware machine learning strategies for antimicrobial peptide discovery.
AlphaFold2
Explainable machine learning
Oversampling
Peptide design
Protein structure prediction
Structural bias
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
25 May 2024
25 May 2024
Historique:
received:
08
02
2024
accepted:
16
05
2024
medline:
26
5
2024
pubmed:
26
5
2024
entrez:
25
5
2024
Statut:
epublish
Résumé
Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
Identifiants
pubmed: 38796582
doi: 10.1038/s41598-024-62419-y
pii: 10.1038/s41598-024-62419-y
doi:
Substances chimiques
Antimicrobial Peptides
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
11995Subventions
Organisme : Mexican research council - Consejo Nacional de Humanidades Ciencias y Tecnologías (CONAHCYT)
ID : A1-S-32579
Organisme : Rosenkranz Medical Research Award 2021
ID : Biotechnology category
Informations de copyright
© 2024. The Author(s).
Références
de Oliveira, E. C. L., da Costa, K. S., Taube, P. S., Lima, A. H. & Junior, C. de S. de S. Biological membrane-penetrating peptides: computational prediction and applications. Front. Cell. Infect. Microbiol. 12, (2022).
Ali, F., Kumar, H., Alghamdi, W., Kateb, F. A. & Alarfaj, F. K. Recent advances in machine learning-based models for prediction of antiviral peptides. Arch. Comput. Methods Eng. 30, 4033–4044 (2023).
doi: 10.1007/s11831-023-09933-w
Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence. Commun. Biol. 4, 1–13 (2021).
doi: 10.1038/s42003-021-02586-0
Aguilera-Puga, M. d. C., Cancelarich, N. L., Marani, M. M., De La Fuente-Nunez, C. & Plisson, F. Accelerating the discovery and design of antimicrobial peptides with artificial intelligence. In Computational Drug Discovery and Design (Springer, 2023).
Grisoni, F. et al. Designing anticancer peptides by constructive machine learning. ChemMedChem 13, 1300–1302 (2018).
pubmed: 29679519
doi: 10.1002/cmdc.201800204
Hwang, J. S. et al. Development of anticancer peptides using artificial intelligence and combinational therapy for cancer therapeutics. Pharmaceutics 14, 997 (2022).
pubmed: 35631583
pmcid: 9147327
doi: 10.3390/pharmaceutics14050997
Zakharova, E., Orsi, M., Capecchi, A. & Reymond, J.-L. Machine learning guided discovery of non-hemolytic membrane disruptive anticancer peptides. ChemMedChem 17, e202200291 (2022).
pubmed: 35880810
pmcid: 9541320
doi: 10.1002/cmdc.202200291
Martinez-Hernandez, C., Del Carmen Aguilera-Puga, M. & Plisson, F. Deconstructing the potency and cell-line selectivity of membranolytic anticancer peptides. ChemBioChem 24, e202300058 (2023).
pubmed: 36988008
doi: 10.1002/cbic.202300058
Guo, Z. & Yamaguchi, R. Machine learning methods for protein-protein binding affinity prediction in protein design. Front. Bioinf. 2, 1065703 (2022).
doi: 10.3389/fbinf.2022.1065703
Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, (2018).
Bennett, N. R. et al. Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625 (2023).
pubmed: 37149653
pmcid: 10163288
doi: 10.1038/s41467-023-38328-5
Akbar, R. et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. mAbs 14, 2031482 (2022).
Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P. M. Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol. Sci. 44, 175–189 (2023).
pubmed: 36669976
doi: 10.1016/j.tips.2022.12.005
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).
pubmed: 31308553
doi: 10.1038/s41592-019-0496-6
Clifton, B. E., Kozome, D. & Laurino, P. Efficient exploration of sequence space by sequence-guided protein engineering and design. Biochemistry 62, 210–220 (2023).
pubmed: 35245020
doi: 10.1021/acs.biochem.1c00757
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
pubmed: 36702895
pmcid: 10400306
doi: 10.1038/s41587-022-01618-2
Mazurenko, S., Prokop, Z. & Damborsky, J. Machine learning in enzyme engineering. ACS Catal. 10, 1210–1223 (2020).
doi: 10.1021/acscatal.9b04321
Feehan, R., Montezano, D. & Slusky, J. S. G. Machine learning for enzyme engineering, selection and design. Protein Eng. Des. Sel. 34, gzab019 (2021).
Fjell, C. D. et al. Identification of novel antibacterial peptides by chemoinformatics and machine learning. J. Med. Chem. 52, 2006–2015 (2009).
pubmed: 19296598
doi: 10.1021/jm8015365
Fjell, C. D., Hiss, J. A., Hancock, R. E. W. & Schneider, G. Designing antimicrobial peptides: Form follows function. Nat. Rev. Drug Discov. 11, 37–51 (2012).
doi: 10.1038/nrd3591
Yoshida, M. et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4, 533–543 (2018).
doi: 10.1016/j.chempr.2018.01.005
Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: Are we generating effective drug candidates?. Front. Microbiol. 10, 1–15 (2020).
doi: 10.3389/fmicb.2019.03097
Xu, J. et al. Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides. Brief. Bioinform. 22, bbab083 (2021).
Wang, G., Vaisman, I. I. & van Hoek, M. L. Machine learning prediction of antimicrobial peptides. In Computational Peptide Science (ed. Simonson, T.) vol. 2405 1–37 (Springer US, New York, NY, 2022).
Fernandes, F. C. et al. Geometric deep learning as a potential tool for antimicrobial peptide prediction. Front. Bioinf. 3, 1216362 (2023).
doi: 10.3389/fbinf.2023.1216362
Hancock, R. E. W., Haney, E. F. & Gill, E. E. The immunology of host defence peptides: Beyond antimicrobial activity. Nat. Rev. Immunol. 16, 321–334 (2016).
pubmed: 27087664
doi: 10.1038/nri.2016.29
Haney, E. F., Straus, S. K. & Hancock, R. E. W. Reassessing the host defense peptide landscape. Front. Chem. 7, 43 (2019).
pubmed: 30778385
pmcid: 6369191
doi: 10.3389/fchem.2019.00043
Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: Functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).
pubmed: 32107480
doi: 10.1038/s41573-019-0058-8
Aldas-Bulos, V. D. & Plisson, F. Benchmarking protein structure predictors to assist machine learning-guided peptide discovery. Digit. Discov. 2, 981–993 (2023).
doi: 10.1039/D3DD00045A
Hancock, R. E. W. & Sahl, H. G. Antimicrobial and host-defense peptides as new anti-infective therapeutic strategies. Nat. Biotechnol. 24, 1551–1557 (2006).
pubmed: 17160061
doi: 10.1038/nbt1267
Zasloff, M. Mysteries that still remain. Biochim. Biophys. Acta BBA Biomembr. 1788, 1693–1694 (2009).
Torrent, M., Andreu, D., Nogués, V. M. & Boix, E. Connecting peptide physicochemical and antimicrobial properties by a rational prediction model. PLoS ONE 6, e16968 (2011).
pubmed: 21347392
pmcid: 3036733
doi: 10.1371/journal.pone.0016968
Torrent, M., Valle, J., Nogués, M. V., Boix, E. & Andreu, D. The Generation of antimicrobial peptide activity: A trade-off between charge and aggregation?. Angew. Chem. Int. Ed. 50, 10686–10689 (2011).
doi: 10.1002/anie.201103589
Lee, E. Y., Fulan, B. M., Wong, G. C. L. & Ferguson, A. L. Mapping membrane activity in undiscovered peptide sequence space using machine learning. Proc. Natl. Acad. Sci. 113, 13588–13593 (2016).
pubmed: 27849600
pmcid: 5137689
doi: 10.1073/pnas.1609893113
Brand, G. D., Ramada, M. H. S., Genaro-Mattos, T. C. & Bloch, C. Towards an experimental classification system for membrane active peptides. Sci. Rep. 8, 1194 (2018).
pubmed: 29352252
pmcid: 5775428
doi: 10.1038/s41598-018-19566-w
Brogden, K. A. Antimicrobial peptides: Pore formers or metabolic inhibitors in bacteria?. Nat. Rev. Microbiol. 3, 238–250 (2005).
pubmed: 15703760
doi: 10.1038/nrmicro1098
Sengupta, D., Leontiadou, H., Mark, A. E. & Marrink, S.-J. Toroidal pores formed by antimicrobial peptides show significant disorder. Biochim. Biophys. Acta BBA—Biomembr. 1778, 2308–2317 (2008).
Wimley, W. C. Describing the mechanism of antimicrobial peptide action with the interfacial activity model. ACS Chem. Biol. 5, 905–917 (2010).
pubmed: 20698568
pmcid: 2955829
doi: 10.1021/cb1001558
Hollmann, A., Martinez, M., Maturana, P., Semorile, L. C. & Maffia, P. C. Antimicrobial peptides: Interaction with model and biological membranes and synergism with chemical antibiotics. Front. Chem. 6, 204 (2018).
pubmed: 29922648
pmcid: 5996110
doi: 10.3389/fchem.2018.00204
Juhl, D. W., Glattard, E., Aisenbrey, C. & Bechinger, B. Antimicrobial peptides: Mechanism of action and lipid-mediated synergistic interactions within membranes. Faraday Discuss. 232, 419–434 (2021).
pubmed: 34533138
doi: 10.1039/D0FD00041H
Pirtskhalava, M. et al. DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).
pubmed: 33151284
doi: 10.1093/nar/gkaa991
Wang, G., Li, X. & Wang, Z. APD3: The antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
pubmed: 26602694
doi: 10.1093/nar/gkv1278
Armstrong, D. R. et al. PDBe: Improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 48, D335–D343 (2020).
pubmed: 31691821
Agrawal, P. et al. CPPsite 2.0: A repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 44, D1098–D1103 (2016).
Chen, Z. et al. iFeature: A Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34, 2499–2502 (2018).
pubmed: 29528364
pmcid: 6658705
doi: 10.1093/bioinformatics/bty140
Osorio, D., Rondón-Villarreal, P. & Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 7, 4 (2015).
doi: 10.32614/RJ-2015-001
Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).
pubmed: 28472272
doi: 10.1093/bioinformatics/btx285
Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2, 370–374 (2010).
doi: 10.1002/wics.84
Plisson, F., Ramírez-Sánchez, O. & Martínez-Hernández, C. Machine learning-guided discovery and design of non-hemolytic peptides. Sci. Rep. 10, 16581 (2020).
pubmed: 33024236
pmcid: 7538962
doi: 10.1038/s41598-020-73644-6
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. (2020).
RStudio Team. RStudio: Integrated development for R. RStudio, PBC, Boston, MA (2020).
Breiman, L. Random forests. Mach. Learn. https://doi.org/10.1023/A:1010933404324 (2001).
doi: 10.1023/A:1010933404324
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, (2001).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
doi: 10.1006/jcss.1997.1504
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936).
doi: 10.1111/j.1469-1809.1936.tb02137.x
Cramer, J. S. The origins of logistic regression. SSRN Electron. J. https://doi.org/10.2139/ssrn.360300 (2003).
doi: 10.2139/ssrn.360300
Breiman, L., Friedman, J. H., Stone, C. J. & Olshen, R. A. Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series Wadsworth statistics/probability series (Taylor & Francis, 1984). https://doi.org/10.1201/9781315139470 .
Cunningham, P. & Delany, S. J. k-nearest neighbour classifiers—a tutorial. ACM Comput. Surv. 54, 1–25 (2022).
doi: 10.1145/3459665
Current Trends in Knowledge Acquisition. (IOS Press, Amsterdam, 1990).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
doi: 10.1007/BF00994018
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
doi: 10.1007/s10994-006-6226-1
Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval. (Cambridge University Press, 2008). https://doi.org/10.1017/CBO9780511809071 .
Bentley, J. L. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975).
doi: 10.1145/361002.361007
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
doi: 10.1613/jair.953
Lunardon, N., Menardi, G. & Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 6, 79 (2014).
doi: 10.32614/RJ-2014-008
He, H., Bai, Y., Garcia, E. A., & Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1322–1328 (IEEE, Hong Kong, China, 2008). https://doi.org/10.1109/IJCNN.2008.4633969 .
Mirdita, M. et al. ColabFold: Making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
pubmed: 35637307
pmcid: 9184281
doi: 10.1038/s41592-022-01488-1
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
pubmed: 34265844
pmcid: 8371605
doi: 10.1038/s41586-021-03819-2
Heinig, M. & Frishman, D. STRIDE: A web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32, W500–W502 (2004).
pubmed: 15215436
pmcid: 441567
doi: 10.1093/nar/gkh429
Hamilton, N. E. & Ferry, M. ggtern: Ternary Diagrams Using ggplot2. J. Stat. Softw. 87, (2018).
Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? https://doi.org/10.48550/ARXIV.2207.08815 (2022).
Dean, S. N., Alvarez, J. A. E., Zabetakis, D., Walper, S. A. & Malanoski, A. P. PepVAE: Variational autoencoder framework for antimicrobial peptide generation and activity prediction. Front. Microbiol. 12, 725727 (2021).
pubmed: 34659152
pmcid: 8515052
doi: 10.3389/fmicb.2021.725727
Grafskaia, E. N. et al. Non-toxic antimicrobial peptide Hm-AMP2 from leech metagenome proteins identified by the gradient-boosting approach. Mater. Des. 224, 111364 (2022).
doi: 10.1016/j.matdes.2022.111364
Sequeira, A. M., Lousa, D. & Rocha, M. ProPythia: A Python package for protein classification based on machine and deep learning. Neurocomputing 484, 172–182 (2022).
doi: 10.1016/j.neucom.2021.07.102
Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. I. AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).
pubmed: 29374199
pmcid: 5785966
doi: 10.1038/s41598-018-19752-w
Lee, E. Y., Lee, M. W., Fulan, B. M., Ferguson, A. L. & Wong, G. C. L. What can machine learning do for antimicrobial peptides, and what can antimicrobial peptides do for machine learning?. Interface Focus 7, 20160153 (2017).
pubmed: 29147555
pmcid: 5665795
doi: 10.1098/rsfs.2016.0153