Machine learning-guided discovery and design of non-hemolytic peptides.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
06 10 2020
06 10 2020
Historique:
received:
19
07
2020
accepted:
18
09
2020
entrez:
7
10
2020
pubmed:
8
10
2020
medline:
9
3
2021
Statut:
epublish
Résumé
Reducing hurdles to clinical trials without compromising the therapeutic promises of peptide candidates becomes an essential step in peptide-based drug design. Machine-learning models are cost-effective and time-saving strategies used to predict biological activities from primary sequences. Their limitations lie in the diversity of peptide sequences and biological information within these models. Additional outlier detection methods are needed to set the boundaries for reliable predictions; the applicability domain. Antimicrobial peptides (AMPs) constitute an extensive library of peptides offering promising avenues against antibiotic-resistant infections. Most AMPs present in clinical trials are administrated topically due to their hemolytic toxicity. Here we developed machine learning models and outlier detection methods that ensure robust predictions for the discovery of AMPs and the design of novel peptides with reduced hemolytic activity. Our best models, gradient boosting classifiers, predicted the hemolytic nature from any peptide sequence with 95-97% accuracy. Nearly 70% of AMPs were predicted as hemolytic peptides. Applying multivariate outlier detection models, we found that 273 AMPs (~ 9%) could not be predicted reliably. Our combined approach led to the discovery of 34 high-confidence non-hemolytic natural AMPs, the de novo design of 507 non-hemolytic peptides, and the guidelines for non-hemolytic peptide design.
Identifiants
pubmed: 33024236
doi: 10.1038/s41598-020-73644-6
pii: 10.1038/s41598-020-73644-6
pmc: PMC7538962
doi:
Substances chimiques
Pore Forming Cytotoxic Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
16581Références
Fosgerau, K. & Hoffmann, T. Peptide therapeutics: current status and future directions. Drug Discov. Today 20, 122–128 (2015).
pubmed: 25450771
doi: 10.1016/j.drudis.2014.10.003
pmcid: 25450771
Lau, J. L. & Dunn, M. K. Therapeutic peptides: historical perspectives, current development trends, and future directions. Bioorg. Med. Chem. 26, 2700–2707 (2018).
pubmed: 28720325
doi: 10.1016/j.bmc.2017.06.052
pmcid: 28720325
Haney, E. F., Straus, S. K. & Hancock, R. E. W. Reassessing the host defense peptide landscape. Front. Chem. 7, 1–22 (2019).
doi: 10.3389/fchem.2019.00043
Fernández de Ullivarri, M., Arbulu, S., Garcia-Gutierrez, E. & Cotter, P. D. Antifungal peptides as therapeutic agents. Front. Cell. Infect. Microbiol. 10, 105 (2020).
pubmed: 32257965
pmcid: 7089922
doi: 10.3389/fcimb.2020.00105
Nyanguile, O. Peptide antiviral strategies as an alternative to treat lower respiratory viral infections. Front. Immunol. 10, 1366 (2019).
pubmed: 31293570
pmcid: 6598224
doi: 10.3389/fimmu.2019.01366
Lacerda, A. F., Pelegrini, P. B., de Oliveira, D. M., Vasconcelos, ÉA. R. & Grossi-de-Sá, M. F. Anti-parasitic peptides from arthropods and their application in drug therapy. Front. Microbiol. 7, 1–11 (2016).
doi: 10.3389/fmicb.2016.00091
Windley, M. J. et al. Spider-venom peptides as bioinsecticides. Toxins (Basel) 4, 191–227 (2012).
doi: 10.3390/toxins4030191
Gabernet, G., Müller, A. T., Hiss, J. A. & Schneider, G. Membranolytic anticancer peptides. Medchemcomm 7, 2232–2245 (2016).
doi: 10.1039/C6MD00376A
McGregor, D. Discovering and improving novel peptide therapeutics. Curr. Opin. Pharmacol. 8, 616–619 (2008).
pubmed: 18602024
doi: 10.1016/j.coph.2008.06.002
Lin, Y., Cai, Y., Liu, J., Lin, C. & Liu, X. An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies. BMC Bioinform. 20, 1–10 (2019).
doi: 10.1186/s12859-018-2565-8
Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: are we generating effective drug candidates?. Front. Microbiol. 10, 1–15 (2020).
doi: 10.3389/fmicb.2019.03097
Speck-Planche, A., Kleandrova, V. V., Ruso, J. M. & Dias Soeiro Cordeiro, M. N. First multitarget chemo-bioinformatic model to enable the discovery of antibacterial peptides against multiple gram-positive pathogens. J. Chem. Inf. Model. 56, 588–598 (2016).
pubmed: 26960000
doi: 10.1021/acs.jcim.5b00630
Kleandrova, V. V., Ruso, J. M., Speck-Planche, A. & Dias Soeiro Cordeiro, M. N. Enabling the discovery and virtual screening of potent and safe antimicrobial peptides. Simultaneous prediction of antibacterial activity and cytotoxicity. ACS Comb. Sci. 18, 490–498 (2016).
pubmed: 27280735
doi: 10.1021/acscombsci.6b00063
Munteanu, C. R. et al. Improvement of epitope prediction using peptide sequence descriptors and machine learning. Int. J. Mol. Sci. 20, 4362 (2019).
pmcid: 6770149
doi: 10.3390/ijms20184362
pubmed: 6770149
Shoombuatong, W., Schaduangrat, N. & Nantasenamat, C. Unraveling the bioactivity of anticancer peptides as deduced from machine learning. EXCLI J. 17, 734–752 (2018).
pubmed: 30190664
pmcid: 6123611
Gabernet, G. et al. In silico design and optimization of selective membranolytic anticancer peptides. Sci. Rep. 9, 11282 (2019).
pubmed: 31375699
pmcid: 6677754
doi: 10.1038/s41598-019-47568-9
Speck-Planche, A. & Cordeiro, M. N. D. S. Speeding up the virtual design and screening of therapeutic peptides, in Multi-Scale Approaches in Drug Discovery. 127–147. (Elsevier, Amsterdam, 2017).
Win, T. S. et al. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem. 9, 275–291 (2017).
pubmed: 28211294
doi: 10.4155/fmc-2016-0188
Chaudhary, K. et al. A web server and mobile app for computing hemolytic potency of peptides. Sci. Rep. 6, 22843 (2016).
pubmed: 26953092
pmcid: 4782144
doi: 10.1038/srep22843
Kawashima, S., Ogata, H. & Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res. 27, 368–369 (1999).
pubmed: 9847231
pmcid: 148186
doi: 10.1093/nar/27.1.368
Hasan, M. M. et al. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 36, 3350–3356 (2020).
pubmed: 32145017
doi: 10.1093/bioinformatics/btaa160
Timmons, P. B. & Hewage, C. M. HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks. Sci. Rep. 10, 10869 (2020).
pubmed: 32616760
pmcid: 7331684
doi: 10.1038/s41598-020-67701-3
Gautam, A. et al. Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides. Nucleic Acids Res. 42, D444–D449 (2014).
pubmed: 24174543
doi: 10.1093/nar/gkt1008
Jungo, F., Bougueleret, L., Xenarios, I. & Poux, S. The UniProtKB/Swiss-Prot Tox-Prot program: a central hub of integrated venom protein data. Toxicon 60, 551–557 (2012).
pubmed: 22465017
pmcid: 3393831
doi: 10.1016/j.toxicon.2012.03.010
Pirtskhalava, M. et al. DBAASP vol 2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides. Nucleic Acids Res. 44, D1104–D1112 (2016).
pubmed: 26578581
doi: 10.1093/nar/gkv1174
Wang, G., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
pubmed: 26602694
doi: 10.1093/nar/gkv1278
Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).
pubmed: 28472272
doi: 10.1093/bioinformatics/btx285
Hosmer, D. W., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression: Applied Logistic Regression 3rd edn. (Wiley, Hoboken, 2013). https://doi.org/10.1002/9781118548387 .
doi: 10.1002/9781118548387
Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).
doi: 10.1109/TIT.1967.1053964
Tharwat, A. Linear vs. quadratic discriminant analysis classifier: a tutorial. Int. J. Appl. Pattern Recognit. 3, 145 (2016).
doi: 10.1504/IJAPR.2016.079050
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Breiman, L., Friedman, J. H., Stone, C. J. & Olshen, R. A. Classification and Regression Trees. The Wadsworth and Brooks-Cole Statistics-Probability Series Wadsworth Statistics/Probability Series (Taylor & Francis, Abingdon, 1984).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Friedman, J. H. Machine. Ann. Stat. 29, 1189–1232 (2001).
doi: 10.1214/aos/1013203451
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
doi: 10.1006/jcss.1997.1504
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794 (2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).
doi: 10.1023/A:1012487302797
Johnsson, T. A procedure for stepwise regression analysis. Stat. Pap. 33, 21–29 (1992).
doi: 10.1007/BF02925308
Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2, 370–374 (2010).
doi: 10.1002/wics.84
Mahalanobis, P. C. On the generalized distance in statistics. 49–55 (1936).
Breuniq, M. M., Kriegel, H. P., Ng, R. T. & Sander, J. LOF: identifying density-based local outliers. . SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data) 29, 93–104 (2000).
He, Z., Xu, X. & Deng, S. Discovering cluster-based local outliers. Pattern Recognit. Lett. 24, 1641–1650 (2003).
doi: 10.1016/S0167-8655(03)00003-5
Goldstein, M. & Dengel, A. Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. In KI-2012 Poster Demo Track 59–63 (2012).
Peng, Y. & Biao, H. KNN based outlier detection algorithm in large dataset. In 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing, ETT GRS, vol 1, 611–613 (2008).
Tony Liu, F., Ming Ting, K. & Zhou, Z.-H. Isolation forest ICDM08. Icdm (2008).
Lazarevic, A. & Kumar, V. Feature bagging for outlier detection. In Proceedings of the ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, 157–166 (2005).
Kriegel, H. & Schubert, M. Angle-based outlier detection in high-dimensional data, 444–452.
Zhao, Y., Nasrullah, Z. & Li, Z. PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20, 1–7 (2019).
Lee, J. A., Peluffo-Ordóñez, D. H. & Verleysen, M. Multi-scale similarities in stochastic neighbour embedding: reducing dimensionality while preserving both local and global structure. Neurocomputing 169, 246–261 (2015).
doi: 10.1016/j.neucom.2014.12.095
Kraemer, G., Reichstein, M. & Mahecha, M. D. dimRed and coRanking-unifying dimensionality reduction in R. R J. 10, 342 (2018).
doi: 10.32614/RJ-2018-039
Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2020).
RStudio Team. RStudio: Integrated Development for R. RStudio, PBC, Boston, MA. https://www.rstudio.com/ (2020).
Moore, M. L. Medicinal chemistry. Ind. Eng. Chem. 43, 577–588 (1951).
doi: 10.1021/ie50495a015
Zimek, A., Schubert, E. & Kriegel, H. P. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. https://doi.org/10.1002/sam.11161 (2012).
doi: 10.1002/sam.11161
Bartels, E. J. H., Dekker, D. & Amiche, M. Dermaseptins, multifunctional antimicrobial peptides: a review of their pharmacology, effectivity, mechanism of action, and possible future directions. Front. Pharmacol. 10, 1–11 (2019).
doi: 10.3389/fphar.2019.01421
Zhou, J. G. et al. Molecular cloning and characterization of two novel hepcidins from orange-spotted grouper, Epinephelus coioides. Fish Shellfish Immunol. 30, 559–568 (2011).
pubmed: 21145974
doi: 10.1016/j.fsi.2010.11.021
Sitaram, N., Subbalakshmi, C., Krishnakumari, V. & Nagaraj, R. Identification of the region that plays an important role in determining antibacterial activity of bovine seminalplasmin. FEBS Lett. 400, 289–292 (1997).
pubmed: 9009216
doi: 10.1016/S0014-5793(96)01406-8
Li, J. et al. Anti-infection peptidomics of amphibian skin. Mol. Cell. Proteomics 6, 882–894 (2007).
pubmed: 17272268
doi: 10.1074/mcp.M600334-MCP200
Conlon, J. M. et al. Host defense peptides in skin secretions of the Oregon spotted frog Rana pretiosa: implications for species resistance to chytridiomycosis. Dev. Comp. Immunol. 35, 644–649 (2011).
pubmed: 21295070
doi: 10.1016/j.dci.2011.01.017
Marani, M. M. et al. Characterization and biological activities of ocellatin peptides from the skin secretion of the frog leptodactylus pustulatus. J. Nat. Prod. 78, 1495–1504 (2015).
pubmed: 26107622
doi: 10.1021/np500907t
Zohrab, F., Askarian, S., Jalili, A. & Kazemi Oskuee, R. Biological properties, current applications and potential therapeautic applications of brevinin peptide superfamily. Int. J. Pept. Res. Ther. 25, 39–48 (2019).
pubmed: 32214928
doi: 10.1007/s10989-018-9723-8
Lai, R. et al. Antimicrobial peptides from skin secretions of Chinese red belly toad Bombina maxima. Peptides 23, 427–435 (2002).
pubmed: 11835991
doi: 10.1016/S0196-9781(01)00641-6
Zhang, X.-J. et al. Distinctive structural hallmarks and biological activities of the multiple cathelicidin antimicrobial peptides in a primitive teleost fish. J. Immunol. 194, 4974–4987 (2015).
pubmed: 25876762
doi: 10.4049/jimmunol.1500182
Couillault, C. et al. TLR-independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR-1, an ortholog of human SARM. Nat. Immunol. 5, 488–494 (2004).
pubmed: 15048112
doi: 10.1038/ni1060
Lim, M.-P., Firdaus-Raih, M. & Nathan, S. Nematode peptides with host-directed anti-inflammatory activity rescue Caenorhabditis elegans from a Burkholderia pseudomallei infection. Front. Microbiol. 7, 1436 (2016).
pubmed: 27672387
pmcid: 5019075
Kumar, V., Kumar, R., Agrawal, P., Patiyal, S. & Raghava, G. P. S. A method for predicting hemolytic potency of chemically modified peptides from its structure. Front. Pharmacol. 11, 1–8 (2020).
doi: 10.3389/fphar.2020.00001
Seelig, J. Thermodynamics of lipid-peptide interactions. Biochim. Biophys. Acta Biomembr. 1666, 40–50 (2004).
doi: 10.1016/j.bbamem.2004.08.004
Guimarães, C. R. W., Mathiowetz, A. M., Shalaeva, M., Goetz, G. & Liras, S. Use of 3D properties to characterize beyond rule-of-5 property space for passive permeation. J. Chem. Inf. Model. 52, 882–890 (2012).
pubmed: 22394163
doi: 10.1021/ci300010y
pmcid: 22394163
Organization for Economic Cooperation and Development (OECD). Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models (2007).
Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010).
pubmed: 27463326
doi: 10.1002/minf.201000061
pmcid: 27463326
Zheng, S. et al. In silico prediction of hemolytic toxicity on the human erythrocytes for small molecules by machine-learning and genetic algorithm. J. Med. Chem. 63, 6499–6512 (2020).
pubmed: 31282671
doi: 10.1021/acs.jmedchem.9b00853
pmcid: 31282671
Zheng, S. et al. Quantitative prediction of hemolytic toxicity for small molecules and their potential hemolytic fragments by machine learning and recursive fragmentation methods. J. Chem. Inf. Model. 60, 3231–3245 (2020).
pubmed: 32364718
doi: 10.1021/acs.jcim.0c00102
pmcid: 32364718