A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.

Algorithms Amino Acid Sequence Computer Simulation Databases, Chemical Databases, Protein Drug Development / methods Drug Discovery / methods Enzymes / chemistry Ion Channels / chemistry Machine Learning Pharmaceutical Preparations / chemistry Receptors, Cytoplasmic and Nuclear / chemistry Receptors, G-Protein-Coupled / chemistry

Dipeptide Composition Drug-Target Interaction Ensemble Learning Feature Selection Molecular Descriptors

Journal

Molecular informatics

ISSN: 1868-1751

Titre abrégé: Mol Inform

Pays: Germany

ID NLM: 101529315

Informations de publication

Date de publication:
05 2020

Historique:

received: 31 05 2019

accepted: 28 01 2020

pubmed: 1 2 2020

medline: 1 5 2021

entrez: 1 2 2020

Statut: ppublish

Résumé

Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.

Identifiants

DOI: 10.1002/minf.201900062 PMID: 32003548

pubmed: 32003548

doi: 10.1002/minf.201900062

doi:

Substances chimiques

Enzymes 0

Ion Channels 0

Pharmaceutical Preparations 0

Receptors, Cytoplasmic and Nuclear 0

Receptors, G-Protein-Coupled 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e1900062

Informations de copyright

Références

J. Knowles, G. Gromo, Nat. Rev. Drug Discovery 2003, 2, 63-69.

X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, S.-Q. Liu, Int. J. Mol. Sci. 2016, 17, 144.

A. L. Hopkins, G. M. Keserü, P. D. Leeson, D. C. Rees, C. H. Reynolds, Nat. Rev. Drug Discovery 2014, 13, 105-121.

K. C. Chou, D. Q. Wei, W. Z. Zhong, Biochem. Biophys. Res. Commun. 2003, 308, 148-151.

S. Zhu, Y. Okuno, G. Tsujimoto, H. Mamitsuka, Bioinformatics 2005, 21, ii245-ii251.

P. Mutowo, A. P. Bento, N. Dedman, A. Gaulton, A. Hersey, J. Lomax, J. P. Overington, J. Biomed. Semantics 2016, 7, 59.

Z. Mousavian, A. Masoudi-Nejad, Expert Opin. Drug Metab. Toxicol. 2014, 10, 1273-1287.

S. Alaimo, A. Pulvirenti, R. Giugno, A. Ferro, Bioinformatics 2013, 29, 2004-2008.

Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, M. Kanehisa, Bioinformatics 2008, 24, i232-i240.

H. Chen, Z. Zhang, PLoS One 2013, 8, e62975.

X. Y. Yan, S. W. Zhang, S. Y. Zhang, Mol. BioSyst. 2016, 12, 520-531.

K. Buza, L. Peška, Neurocomputing 2017, 260, 284-293.

Z. He, J. Zhang, X.-H. Shi, L.-L. Hu, X. Kong, Y.-D. Cai, K.-C. Chou, PLoS One 2010, 5, e9603.

X. Xiao, J.-L. Min, W.-Z. Lin, Z. Liu, X. Cheng, K.-C. Chou, J. Biomol. Struct. Dyn. 2015, 33, 2221-2233.

L. Nanni, A. Lumini, S. Brahnam, J. Theor. Biol. 2014, 359, 120-128.

F. Rayhan, S. Ahmed, S. Shatabda, D. M. Farid, Z. Mousavian, A. Dehzangi, M. S. Rahman, Sci. Rep. 2017, 7, 17731.

Z. Li, P. Han, Z.-H. You, X. Li, Y. Zhang, H. Yu, R. Nie, X. Chen, Sci. Rep. 2017, 7, 11174.

N. Nagamine, Y. Sakakibara, Bioinformatics 2007, 23, 2004-2012.

N. Nagamine, T. Shirakawa, Y. Minato, K. Torii, H. Kobayashi, M. Imoto, Y. Sakakibara, PLoS Comput. Biol. 2009, 5, e1000397.

D. Reker, P. Schneider, G. Schneider, J. Brown, Future Med. Chem. 2017, 9, 381-402.

Y. Hu, J. Bajorath, in Front. Mol. Des. Chem. Inf. Sci. - Herman Sk. Award Symp. 2015 Jürgen Bajorath, 2016, pp. 35-51.

V. Chupakhin, G. Marcou, I. Baskin, A. Varnek, D. Rognan, J. Chem. Inf. Model. 2013, 53, 763-772.

I. Vogt, J. Mestres, Mol. Inf. 2010, 29, 10-14.

T. K. Attwood, in Dict. Bioinforma. Comput. Biol., John Wiley & Sons, Ltd, Chichester, UK, 2004.

K.-C. Chou, Proteins Struct. Funct. Genet. 2001, 43, 246-255.

S.-X. Lin, J. Lapointe, J. Biomed. Sci. Eng. 2013, 06, 435-442.

W. Liu, K. C. Chou, Protein Eng. 1999, 12, 1041-50.

In An Introd. To Chemoinformatics, Springer Netherlands, Dordrecht, 2007, pp. 53-74.

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, J. Artif. Intell. Res. 2002, 16, 321-357.

R. Kohavi, G. H. John, Artif. Intell. 1997, 97, 273-324.

N. Xiao, D. S. Cao, M. F. Zhu, Q. S. Xu, Bioinformatics 2015, 31, 1857-1859.

C. W. Yap, J. Comput. Chem. 2011, 32, 1466-1474.

J. Dong, D.-S. Cao, H.-Y. Miao, S. Liu, B.-C. Deng, Y.-H. Yun, N.-N. Wang, A.-P. Lu, W.-B. Zeng, A. F. Chen, J. Cheminf. 2015, 7, 60.

T. Kawabata, Y. Sugihara, Y. Fukunishi, H. Nakamura, Biophysics (Oxf). 2013, 9, 113-121.

V. Law, C. Knox, Y. Djoumbou, T. Jewison, A. C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, Nucleic Acids Res. 2014, 42, D1091-D1097.

C. Kingsford, S. L. Salzberg, Nat. Biotechnol. 2008, 26, 1011-1013.

T. Chen, C. Guestrin, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD'16, ACM Press, New York, New York, USA, 2016, pp. 785-794.

C. Bustamante, L. Garrido, R. Soto, in MICAI 2006 Adv. Artif. Intell. MICAI 2006 Lect. Notes Comput. Sci., Springer, Berlin, Heidelberg, 2006, pp. 237-247.

M. A. jabbar, B. L. Deekshatulu, P. Chandra, Procedia Technol. 2013, 10, 85-94.

J. S. Cramer, The Origins of Logistic Regression, 119, Tinbergen Institute, 2002, 167-178.

L. Breiman, Leo, Mach. Learn. 2001, 45, 5-32.

C. Cortes, V. Vapnik, Mach. Learn. 1995, 20, 273-297.

E. Frank, M. A. Hall, I. H. Witten, in Data Min., Elsevier, 2017, pp. 553-571.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, J. Mach. Learn. Res. 2011, 12, 2825-2830.

R. Batuwita, V. Palade, J. Bioinform. Comput. Biol. 2012, 10, 1250003.

J. Han, M. Kamber, J. Pei, Introduction, Elsevier, USA, 2012.

J. Jiang, N. Wang, P. Chen, J. Zhang, B. Wang, BioMed Res. Int. 2017, 2017, 1-10.

F. Esposito, D. Malerba, G. Semeraro, V. Tamma, Appl. Stoch. Model. Bus. Ind. 1999, 15, 277-299.

J. Zhang, M. Zhu, P. Chen, B. Wang, Neurocomputing 2017, 228, 256-262.

M. Kanehisa, Nucleic Acids Res. 2006, 34, D354-D357.

A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Shweta Redkar (S)

Sukanta Mondal (S)

Alex Joseph (A)

K S Hareesha (KS)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Impact of supply chain disruptions and drug shortages on drug utilization: A scoping review protocol.

FBXO22 inhibits colitis and colorectal carcinogenesis by regulating the degradation of the S2448-phosphorylated form of mTOR.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Classifications MeSH