Machine learning-guided discovery and design of non-hemolytic peptides.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
06 10 2020
Historique:
received: 19 07 2020
accepted: 18 09 2020
entrez: 7 10 2020
pubmed: 8 10 2020
medline: 9 3 2021
Statut: epublish

Résumé

Reducing hurdles to clinical trials without compromising the therapeutic promises of peptide candidates becomes an essential step in peptide-based drug design. Machine-learning models are cost-effective and time-saving strategies used to predict biological activities from primary sequences. Their limitations lie in the diversity of peptide sequences and biological information within these models. Additional outlier detection methods are needed to set the boundaries for reliable predictions; the applicability domain. Antimicrobial peptides (AMPs) constitute an extensive library of peptides offering promising avenues against antibiotic-resistant infections. Most AMPs present in clinical trials are administrated topically due to their hemolytic toxicity. Here we developed machine learning models and outlier detection methods that ensure robust predictions for the discovery of AMPs and the design of novel peptides with reduced hemolytic activity. Our best models, gradient boosting classifiers, predicted the hemolytic nature from any peptide sequence with 95-97% accuracy. Nearly 70% of AMPs were predicted as hemolytic peptides. Applying multivariate outlier detection models, we found that 273 AMPs (~ 9%) could not be predicted reliably. Our combined approach led to the discovery of 34 high-confidence non-hemolytic natural AMPs, the de novo design of 507 non-hemolytic peptides, and the guidelines for non-hemolytic peptide design.

Identifiants

pubmed: 33024236
doi: 10.1038/s41598-020-73644-6
pii: 10.1038/s41598-020-73644-6
pmc: PMC7538962
doi:

Substances chimiques

Pore Forming Cytotoxic Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

16581

Références

Fosgerau, K. & Hoffmann, T. Peptide therapeutics: current status and future directions. Drug Discov. Today 20, 122–128 (2015).
pubmed: 25450771 doi: 10.1016/j.drudis.2014.10.003 pmcid: 25450771
Lau, J. L. & Dunn, M. K. Therapeutic peptides: historical perspectives, current development trends, and future directions. Bioorg. Med. Chem. 26, 2700–2707 (2018).
pubmed: 28720325 doi: 10.1016/j.bmc.2017.06.052 pmcid: 28720325
Haney, E. F., Straus, S. K. & Hancock, R. E. W. Reassessing the host defense peptide landscape. Front. Chem. 7, 1–22 (2019).
doi: 10.3389/fchem.2019.00043
Fernández de Ullivarri, M., Arbulu, S., Garcia-Gutierrez, E. & Cotter, P. D. Antifungal peptides as therapeutic agents. Front. Cell. Infect. Microbiol. 10, 105 (2020).
pubmed: 32257965 pmcid: 7089922 doi: 10.3389/fcimb.2020.00105
Nyanguile, O. Peptide antiviral strategies as an alternative to treat lower respiratory viral infections. Front. Immunol. 10, 1366 (2019).
pubmed: 31293570 pmcid: 6598224 doi: 10.3389/fimmu.2019.01366
Lacerda, A. F., Pelegrini, P. B., de Oliveira, D. M., Vasconcelos, ÉA. R. & Grossi-de-Sá, M. F. Anti-parasitic peptides from arthropods and their application in drug therapy. Front. Microbiol. 7, 1–11 (2016).
doi: 10.3389/fmicb.2016.00091
Windley, M. J. et al. Spider-venom peptides as bioinsecticides. Toxins (Basel) 4, 191–227 (2012).
doi: 10.3390/toxins4030191
Gabernet, G., Müller, A. T., Hiss, J. A. & Schneider, G. Membranolytic anticancer peptides. Medchemcomm 7, 2232–2245 (2016).
doi: 10.1039/C6MD00376A
McGregor, D. Discovering and improving novel peptide therapeutics. Curr. Opin. Pharmacol. 8, 616–619 (2008).
pubmed: 18602024 doi: 10.1016/j.coph.2008.06.002
Lin, Y., Cai, Y., Liu, J., Lin, C. & Liu, X. An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies. BMC Bioinform. 20, 1–10 (2019).
doi: 10.1186/s12859-018-2565-8
Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: are we generating effective drug candidates?. Front. Microbiol. 10, 1–15 (2020).
doi: 10.3389/fmicb.2019.03097
Speck-Planche, A., Kleandrova, V. V., Ruso, J. M. & Dias Soeiro Cordeiro, M. N. First multitarget chemo-bioinformatic model to enable the discovery of antibacterial peptides against multiple gram-positive pathogens. J. Chem. Inf. Model. 56, 588–598 (2016).
pubmed: 26960000 doi: 10.1021/acs.jcim.5b00630
Kleandrova, V. V., Ruso, J. M., Speck-Planche, A. & Dias Soeiro Cordeiro, M. N. Enabling the discovery and virtual screening of potent and safe antimicrobial peptides. Simultaneous prediction of antibacterial activity and cytotoxicity. ACS Comb. Sci. 18, 490–498 (2016).
pubmed: 27280735 doi: 10.1021/acscombsci.6b00063
Munteanu, C. R. et al. Improvement of epitope prediction using peptide sequence descriptors and machine learning. Int. J. Mol. Sci. 20, 4362 (2019).
pmcid: 6770149 doi: 10.3390/ijms20184362 pubmed: 6770149
Shoombuatong, W., Schaduangrat, N. & Nantasenamat, C. Unraveling the bioactivity of anticancer peptides as deduced from machine learning. EXCLI J. 17, 734–752 (2018).
pubmed: 30190664 pmcid: 6123611
Gabernet, G. et al. In silico design and optimization of selective membranolytic anticancer peptides. Sci. Rep. 9, 11282 (2019).
pubmed: 31375699 pmcid: 6677754 doi: 10.1038/s41598-019-47568-9
Speck-Planche, A. & Cordeiro, M. N. D. S. Speeding up the virtual design and screening of therapeutic peptides, in Multi-Scale Approaches in Drug Discovery. 127–147. (Elsevier, Amsterdam, 2017).
Win, T. S. et al. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem. 9, 275–291 (2017).
pubmed: 28211294 doi: 10.4155/fmc-2016-0188
Chaudhary, K. et al. A web server and mobile app for computing hemolytic potency of peptides. Sci. Rep. 6, 22843 (2016).
pubmed: 26953092 pmcid: 4782144 doi: 10.1038/srep22843
Kawashima, S., Ogata, H. & Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res. 27, 368–369 (1999).
pubmed: 9847231 pmcid: 148186 doi: 10.1093/nar/27.1.368
Hasan, M. M. et al. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 36, 3350–3356 (2020).
pubmed: 32145017 doi: 10.1093/bioinformatics/btaa160
Timmons, P. B. & Hewage, C. M. HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks. Sci. Rep. 10, 10869 (2020).
pubmed: 32616760 pmcid: 7331684 doi: 10.1038/s41598-020-67701-3
Gautam, A. et al. Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides. Nucleic Acids Res. 42, D444–D449 (2014).
pubmed: 24174543 doi: 10.1093/nar/gkt1008
Jungo, F., Bougueleret, L., Xenarios, I. & Poux, S. The UniProtKB/Swiss-Prot Tox-Prot program: a central hub of integrated venom protein data. Toxicon 60, 551–557 (2012).
pubmed: 22465017 pmcid: 3393831 doi: 10.1016/j.toxicon.2012.03.010
Pirtskhalava, M. et al. DBAASP vol 2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides. Nucleic Acids Res. 44, D1104–D1112 (2016).
pubmed: 26578581 doi: 10.1093/nar/gkv1174
Wang, G., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
pubmed: 26602694 doi: 10.1093/nar/gkv1278
Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).
pubmed: 28472272 doi: 10.1093/bioinformatics/btx285
Hosmer, D. W., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression: Applied Logistic Regression 3rd edn. (Wiley, Hoboken, 2013). https://doi.org/10.1002/9781118548387 .
doi: 10.1002/9781118548387
Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).
doi: 10.1109/TIT.1967.1053964
Tharwat, A. Linear vs. quadratic discriminant analysis classifier: a tutorial. Int. J. Appl. Pattern Recognit. 3, 145 (2016).
doi: 10.1504/IJAPR.2016.079050
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Breiman, L., Friedman, J. H., Stone, C. J. & Olshen, R. A. Classification and Regression Trees. The Wadsworth and Brooks-Cole Statistics-Probability Series Wadsworth Statistics/Probability Series (Taylor & Francis, Abingdon, 1984).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Friedman, J. H. Machine. Ann. Stat. 29, 1189–1232 (2001).
doi: 10.1214/aos/1013203451
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
doi: 10.1006/jcss.1997.1504
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794 (2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).
doi: 10.1023/A:1012487302797
Johnsson, T. A procedure for stepwise regression analysis. Stat. Pap. 33, 21–29 (1992).
doi: 10.1007/BF02925308
Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2, 370–374 (2010).
doi: 10.1002/wics.84
Mahalanobis, P. C. On the generalized distance in statistics. 49–55 (1936).
Breuniq, M. M., Kriegel, H. P., Ng, R. T. & Sander, J. LOF: identifying density-based local outliers. . SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data) 29, 93–104 (2000).
He, Z., Xu, X. & Deng, S. Discovering cluster-based local outliers. Pattern Recognit. Lett. 24, 1641–1650 (2003).
doi: 10.1016/S0167-8655(03)00003-5
Goldstein, M. & Dengel, A. Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. In KI-2012 Poster Demo Track 59–63 (2012).
Peng, Y. & Biao, H. KNN based outlier detection algorithm in large dataset. In 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing, ETT GRS, vol 1, 611–613 (2008).
Tony Liu, F., Ming Ting, K. & Zhou, Z.-H. Isolation forest ICDM08. Icdm (2008).
Lazarevic, A. & Kumar, V. Feature bagging for outlier detection. In Proceedings of the ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, 157–166 (2005).
Kriegel, H. & Schubert, M. Angle-based outlier detection in high-dimensional data, 444–452.
Zhao, Y., Nasrullah, Z. & Li, Z. PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20, 1–7 (2019).
Lee, J. A., Peluffo-Ordóñez, D. H. & Verleysen, M. Multi-scale similarities in stochastic neighbour embedding: reducing dimensionality while preserving both local and global structure. Neurocomputing 169, 246–261 (2015).
doi: 10.1016/j.neucom.2014.12.095
Kraemer, G., Reichstein, M. & Mahecha, M. D. dimRed and coRanking-unifying dimensionality reduction in R. R J. 10, 342 (2018).
doi: 10.32614/RJ-2018-039
Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2020).
RStudio Team. RStudio: Integrated Development for R. RStudio, PBC, Boston, MA. https://www.rstudio.com/ (2020).
Moore, M. L. Medicinal chemistry. Ind. Eng. Chem. 43, 577–588 (1951).
doi: 10.1021/ie50495a015
Zimek, A., Schubert, E. & Kriegel, H. P. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. https://doi.org/10.1002/sam.11161 (2012).
doi: 10.1002/sam.11161
Bartels, E. J. H., Dekker, D. & Amiche, M. Dermaseptins, multifunctional antimicrobial peptides: a review of their pharmacology, effectivity, mechanism of action, and possible future directions. Front. Pharmacol. 10, 1–11 (2019).
doi: 10.3389/fphar.2019.01421
Zhou, J. G. et al. Molecular cloning and characterization of two novel hepcidins from orange-spotted grouper, Epinephelus coioides. Fish Shellfish Immunol. 30, 559–568 (2011).
pubmed: 21145974 doi: 10.1016/j.fsi.2010.11.021
Sitaram, N., Subbalakshmi, C., Krishnakumari, V. & Nagaraj, R. Identification of the region that plays an important role in determining antibacterial activity of bovine seminalplasmin. FEBS Lett. 400, 289–292 (1997).
pubmed: 9009216 doi: 10.1016/S0014-5793(96)01406-8
Li, J. et al. Anti-infection peptidomics of amphibian skin. Mol. Cell. Proteomics 6, 882–894 (2007).
pubmed: 17272268 doi: 10.1074/mcp.M600334-MCP200
Conlon, J. M. et al. Host defense peptides in skin secretions of the Oregon spotted frog Rana pretiosa: implications for species resistance to chytridiomycosis. Dev. Comp. Immunol. 35, 644–649 (2011).
pubmed: 21295070 doi: 10.1016/j.dci.2011.01.017
Marani, M. M. et al. Characterization and biological activities of ocellatin peptides from the skin secretion of the frog leptodactylus pustulatus. J. Nat. Prod. 78, 1495–1504 (2015).
pubmed: 26107622 doi: 10.1021/np500907t
Zohrab, F., Askarian, S., Jalili, A. & Kazemi Oskuee, R. Biological properties, current applications and potential therapeautic applications of brevinin peptide superfamily. Int. J. Pept. Res. Ther. 25, 39–48 (2019).
pubmed: 32214928 doi: 10.1007/s10989-018-9723-8
Lai, R. et al. Antimicrobial peptides from skin secretions of Chinese red belly toad Bombina maxima. Peptides 23, 427–435 (2002).
pubmed: 11835991 doi: 10.1016/S0196-9781(01)00641-6
Zhang, X.-J. et al. Distinctive structural hallmarks and biological activities of the multiple cathelicidin antimicrobial peptides in a primitive teleost fish. J. Immunol. 194, 4974–4987 (2015).
pubmed: 25876762 doi: 10.4049/jimmunol.1500182
Couillault, C. et al. TLR-independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR-1, an ortholog of human SARM. Nat. Immunol. 5, 488–494 (2004).
pubmed: 15048112 doi: 10.1038/ni1060
Lim, M.-P., Firdaus-Raih, M. & Nathan, S. Nematode peptides with host-directed anti-inflammatory activity rescue Caenorhabditis elegans from a Burkholderia pseudomallei infection. Front. Microbiol. 7, 1436 (2016).
pubmed: 27672387 pmcid: 5019075
Kumar, V., Kumar, R., Agrawal, P., Patiyal, S. & Raghava, G. P. S. A method for predicting hemolytic potency of chemically modified peptides from its structure. Front. Pharmacol. 11, 1–8 (2020).
doi: 10.3389/fphar.2020.00001
Seelig, J. Thermodynamics of lipid-peptide interactions. Biochim. Biophys. Acta Biomembr. 1666, 40–50 (2004).
doi: 10.1016/j.bbamem.2004.08.004
Guimarães, C. R. W., Mathiowetz, A. M., Shalaeva, M., Goetz, G. & Liras, S. Use of 3D properties to characterize beyond rule-of-5 property space for passive permeation. J. Chem. Inf. Model. 52, 882–890 (2012).
pubmed: 22394163 doi: 10.1021/ci300010y pmcid: 22394163
Organization for Economic Cooperation and Development (OECD). Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models (2007).
Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010).
pubmed: 27463326 doi: 10.1002/minf.201000061 pmcid: 27463326
Zheng, S. et al. In silico prediction of hemolytic toxicity on the human erythrocytes for small molecules by machine-learning and genetic algorithm. J. Med. Chem. 63, 6499–6512 (2020).
pubmed: 31282671 doi: 10.1021/acs.jmedchem.9b00853 pmcid: 31282671
Zheng, S. et al. Quantitative prediction of hemolytic toxicity for small molecules and their potential hemolytic fragments by machine learning and recursive fragmentation methods. J. Chem. Inf. Model. 60, 3231–3245 (2020).
pubmed: 32364718 doi: 10.1021/acs.jcim.0c00102 pmcid: 32364718

Auteurs

Fabien Plisson (F)

CONACYT, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación Y de Estudios Avanzados del IPN, 36824, Irapuato, Guanajuato, Mexico. fabien.plisson@cinvestav.mx.

Obed Ramírez-Sánchez (O)

Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación Y de Estudios Avanzados del IPN, 36824, Irapuato, Guanajuato, Mexico.

Cristina Martínez-Hernández (C)

Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación Y de Estudios Avanzados del IPN, 36824, Irapuato, Guanajuato, Mexico.

Articles similaires

Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Alzheimer Disease Humans Regression Analysis Quantitative Structure-Activity Relationship Drug Design

Classifications MeSH