A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.


Journal

Molecular informatics
ISSN: 1868-1751
Titre abrégé: Mol Inform
Pays: Germany
ID NLM: 101529315

Informations de publication

Date de publication:
05 2020
Historique:
received: 31 05 2019
accepted: 28 01 2020
pubmed: 1 2 2020
medline: 1 5 2021
entrez: 1 2 2020
Statut: ppublish

Résumé

Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.

Identifiants

pubmed: 32003548
doi: 10.1002/minf.201900062
doi:

Substances chimiques

Enzymes 0
Ion Channels 0
Pharmaceutical Preparations 0
Receptors, Cytoplasmic and Nuclear 0
Receptors, G-Protein-Coupled 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1900062

Informations de copyright

© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

Références

J. Knowles, G. Gromo, Nat. Rev. Drug Discovery 2003, 2, 63-69.
X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, S.-Q. Liu, Int. J. Mol. Sci. 2016, 17, 144.
A. L. Hopkins, G. M. Keserü, P. D. Leeson, D. C. Rees, C. H. Reynolds, Nat. Rev. Drug Discovery 2014, 13, 105-121.
K. C. Chou, D. Q. Wei, W. Z. Zhong, Biochem. Biophys. Res. Commun. 2003, 308, 148-151.
S. Zhu, Y. Okuno, G. Tsujimoto, H. Mamitsuka, Bioinformatics 2005, 21, ii245-ii251.
P. Mutowo, A. P. Bento, N. Dedman, A. Gaulton, A. Hersey, J. Lomax, J. P. Overington, J. Biomed. Semantics 2016, 7, 59.
Z. Mousavian, A. Masoudi-Nejad, Expert Opin. Drug Metab. Toxicol. 2014, 10, 1273-1287.
S. Alaimo, A. Pulvirenti, R. Giugno, A. Ferro, Bioinformatics 2013, 29, 2004-2008.
Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, M. Kanehisa, Bioinformatics 2008, 24, i232-i240.
H. Chen, Z. Zhang, PLoS One 2013, 8, e62975.
X. Y. Yan, S. W. Zhang, S. Y. Zhang, Mol. BioSyst. 2016, 12, 520-531.
K. Buza, L. Peška, Neurocomputing 2017, 260, 284-293.
Z. He, J. Zhang, X.-H. Shi, L.-L. Hu, X. Kong, Y.-D. Cai, K.-C. Chou, PLoS One 2010, 5, e9603.
X. Xiao, J.-L. Min, W.-Z. Lin, Z. Liu, X. Cheng, K.-C. Chou, J. Biomol. Struct. Dyn. 2015, 33, 2221-2233.
L. Nanni, A. Lumini, S. Brahnam, J. Theor. Biol. 2014, 359, 120-128.
F. Rayhan, S. Ahmed, S. Shatabda, D. M. Farid, Z. Mousavian, A. Dehzangi, M. S. Rahman, Sci. Rep. 2017, 7, 17731.
Z. Li, P. Han, Z.-H. You, X. Li, Y. Zhang, H. Yu, R. Nie, X. Chen, Sci. Rep. 2017, 7, 11174.
N. Nagamine, Y. Sakakibara, Bioinformatics 2007, 23, 2004-2012.
N. Nagamine, T. Shirakawa, Y. Minato, K. Torii, H. Kobayashi, M. Imoto, Y. Sakakibara, PLoS Comput. Biol. 2009, 5, e1000397.
D. Reker, P. Schneider, G. Schneider, J. Brown, Future Med. Chem. 2017, 9, 381-402.
Y. Hu, J. Bajorath, in Front. Mol. Des. Chem. Inf. Sci. - Herman Sk. Award Symp. 2015 Jürgen Bajorath, 2016, pp. 35-51.
V. Chupakhin, G. Marcou, I. Baskin, A. Varnek, D. Rognan, J. Chem. Inf. Model. 2013, 53, 763-772.
I. Vogt, J. Mestres, Mol. Inf. 2010, 29, 10-14.
T. K. Attwood, in Dict. Bioinforma. Comput. Biol., John Wiley & Sons, Ltd, Chichester, UK, 2004.
K.-C. Chou, Proteins Struct. Funct. Genet. 2001, 43, 246-255.
S.-X. Lin, J. Lapointe, J. Biomed. Sci. Eng. 2013, 06, 435-442.
W. Liu, K. C. Chou, Protein Eng. 1999, 12, 1041-50.
In An Introd. To Chemoinformatics, Springer Netherlands, Dordrecht, 2007, pp. 53-74.
N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, J. Artif. Intell. Res. 2002, 16, 321-357.
R. Kohavi, G. H. John, Artif. Intell. 1997, 97, 273-324.
N. Xiao, D. S. Cao, M. F. Zhu, Q. S. Xu, Bioinformatics 2015, 31, 1857-1859.
C. W. Yap, J. Comput. Chem. 2011, 32, 1466-1474.
J. Dong, D.-S. Cao, H.-Y. Miao, S. Liu, B.-C. Deng, Y.-H. Yun, N.-N. Wang, A.-P. Lu, W.-B. Zeng, A. F. Chen, J. Cheminf. 2015, 7, 60.
T. Kawabata, Y. Sugihara, Y. Fukunishi, H. Nakamura, Biophysics (Oxf). 2013, 9, 113-121.
V. Law, C. Knox, Y. Djoumbou, T. Jewison, A. C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, Nucleic Acids Res. 2014, 42, D1091-D1097.
C. Kingsford, S. L. Salzberg, Nat. Biotechnol. 2008, 26, 1011-1013.
T. Chen, C. Guestrin, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD'16, ACM Press, New York, New York, USA, 2016, pp. 785-794.
C. Bustamante, L. Garrido, R. Soto, in MICAI 2006 Adv. Artif. Intell. MICAI 2006 Lect. Notes Comput. Sci., Springer, Berlin, Heidelberg, 2006, pp. 237-247.
M. A. jabbar, B. L. Deekshatulu, P. Chandra, Procedia Technol. 2013, 10, 85-94.
J. S. Cramer, The Origins of Logistic Regression, 119, Tinbergen Institute, 2002, 167-178.
L. Breiman, Leo, Mach. Learn. 2001, 45, 5-32.
C. Cortes, V. Vapnik, Mach. Learn. 1995, 20, 273-297.
E. Frank, M. A. Hall, I. H. Witten, in Data Min., Elsevier, 2017, pp. 553-571.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, J. Mach. Learn. Res. 2011, 12, 2825-2830.
R. Batuwita, V. Palade, J. Bioinform. Comput. Biol. 2012, 10, 1250003.
J. Han, M. Kamber, J. Pei, Introduction, Elsevier, USA, 2012.
J. Jiang, N. Wang, P. Chen, J. Zhang, B. Wang, BioMed Res. Int. 2017, 2017, 1-10.
F. Esposito, D. Malerba, G. Semeraro, V. Tamma, Appl. Stoch. Model. Bus. Ind. 1999, 15, 277-299.
J. Zhang, M. Zhu, P. Chen, B. Wang, Neurocomputing 2017, 228, 256-262.
M. Kanehisa, Nucleic Acids Res. 2006, 34, D354-D357.

Auteurs

Shweta Redkar (S)

Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India.

Sukanta Mondal (S)

Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, K.K.Birla Goa Campus, 403726, Zuarinagar, Goa, -India.

Alex Joseph (A)

Department of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India.

K S Hareesha (KS)

Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Humans Pharmaceutical Preparations Drug Utilization Prescription Drugs
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Classifications MeSH