A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.
Algorithms
Amino Acid Sequence
Computer Simulation
Databases, Chemical
Databases, Protein
Drug Development
/ methods
Drug Discovery
/ methods
Enzymes
/ chemistry
Ion Channels
/ chemistry
Machine Learning
Pharmaceutical Preparations
/ chemistry
Receptors, Cytoplasmic and Nuclear
/ chemistry
Receptors, G-Protein-Coupled
/ chemistry
Dipeptide Composition
Drug-Target Interaction
Ensemble Learning
Feature Selection
Molecular Descriptors
Journal
Molecular informatics
ISSN: 1868-1751
Titre abrégé: Mol Inform
Pays: Germany
ID NLM: 101529315
Informations de publication
Date de publication:
05 2020
05 2020
Historique:
received:
31
05
2019
accepted:
28
01
2020
pubmed:
1
2
2020
medline:
1
5
2021
entrez:
1
2
2020
Statut:
ppublish
Résumé
Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.
Identifiants
pubmed: 32003548
doi: 10.1002/minf.201900062
doi:
Substances chimiques
Enzymes
0
Ion Channels
0
Pharmaceutical Preparations
0
Receptors, Cytoplasmic and Nuclear
0
Receptors, G-Protein-Coupled
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1900062Informations de copyright
© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Références
J. Knowles, G. Gromo, Nat. Rev. Drug Discovery 2003, 2, 63-69.
X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, S.-Q. Liu, Int. J. Mol. Sci. 2016, 17, 144.
A. L. Hopkins, G. M. Keserü, P. D. Leeson, D. C. Rees, C. H. Reynolds, Nat. Rev. Drug Discovery 2014, 13, 105-121.
K. C. Chou, D. Q. Wei, W. Z. Zhong, Biochem. Biophys. Res. Commun. 2003, 308, 148-151.
S. Zhu, Y. Okuno, G. Tsujimoto, H. Mamitsuka, Bioinformatics 2005, 21, ii245-ii251.
P. Mutowo, A. P. Bento, N. Dedman, A. Gaulton, A. Hersey, J. Lomax, J. P. Overington, J. Biomed. Semantics 2016, 7, 59.
Z. Mousavian, A. Masoudi-Nejad, Expert Opin. Drug Metab. Toxicol. 2014, 10, 1273-1287.
S. Alaimo, A. Pulvirenti, R. Giugno, A. Ferro, Bioinformatics 2013, 29, 2004-2008.
Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, M. Kanehisa, Bioinformatics 2008, 24, i232-i240.
H. Chen, Z. Zhang, PLoS One 2013, 8, e62975.
X. Y. Yan, S. W. Zhang, S. Y. Zhang, Mol. BioSyst. 2016, 12, 520-531.
K. Buza, L. Peška, Neurocomputing 2017, 260, 284-293.
Z. He, J. Zhang, X.-H. Shi, L.-L. Hu, X. Kong, Y.-D. Cai, K.-C. Chou, PLoS One 2010, 5, e9603.
X. Xiao, J.-L. Min, W.-Z. Lin, Z. Liu, X. Cheng, K.-C. Chou, J. Biomol. Struct. Dyn. 2015, 33, 2221-2233.
L. Nanni, A. Lumini, S. Brahnam, J. Theor. Biol. 2014, 359, 120-128.
F. Rayhan, S. Ahmed, S. Shatabda, D. M. Farid, Z. Mousavian, A. Dehzangi, M. S. Rahman, Sci. Rep. 2017, 7, 17731.
Z. Li, P. Han, Z.-H. You, X. Li, Y. Zhang, H. Yu, R. Nie, X. Chen, Sci. Rep. 2017, 7, 11174.
N. Nagamine, Y. Sakakibara, Bioinformatics 2007, 23, 2004-2012.
N. Nagamine, T. Shirakawa, Y. Minato, K. Torii, H. Kobayashi, M. Imoto, Y. Sakakibara, PLoS Comput. Biol. 2009, 5, e1000397.
D. Reker, P. Schneider, G. Schneider, J. Brown, Future Med. Chem. 2017, 9, 381-402.
Y. Hu, J. Bajorath, in Front. Mol. Des. Chem. Inf. Sci. - Herman Sk. Award Symp. 2015 Jürgen Bajorath, 2016, pp. 35-51.
V. Chupakhin, G. Marcou, I. Baskin, A. Varnek, D. Rognan, J. Chem. Inf. Model. 2013, 53, 763-772.
I. Vogt, J. Mestres, Mol. Inf. 2010, 29, 10-14.
T. K. Attwood, in Dict. Bioinforma. Comput. Biol., John Wiley & Sons, Ltd, Chichester, UK, 2004.
K.-C. Chou, Proteins Struct. Funct. Genet. 2001, 43, 246-255.
S.-X. Lin, J. Lapointe, J. Biomed. Sci. Eng. 2013, 06, 435-442.
W. Liu, K. C. Chou, Protein Eng. 1999, 12, 1041-50.
In An Introd. To Chemoinformatics, Springer Netherlands, Dordrecht, 2007, pp. 53-74.
N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, J. Artif. Intell. Res. 2002, 16, 321-357.
R. Kohavi, G. H. John, Artif. Intell. 1997, 97, 273-324.
N. Xiao, D. S. Cao, M. F. Zhu, Q. S. Xu, Bioinformatics 2015, 31, 1857-1859.
C. W. Yap, J. Comput. Chem. 2011, 32, 1466-1474.
J. Dong, D.-S. Cao, H.-Y. Miao, S. Liu, B.-C. Deng, Y.-H. Yun, N.-N. Wang, A.-P. Lu, W.-B. Zeng, A. F. Chen, J. Cheminf. 2015, 7, 60.
T. Kawabata, Y. Sugihara, Y. Fukunishi, H. Nakamura, Biophysics (Oxf). 2013, 9, 113-121.
V. Law, C. Knox, Y. Djoumbou, T. Jewison, A. C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, Nucleic Acids Res. 2014, 42, D1091-D1097.
C. Kingsford, S. L. Salzberg, Nat. Biotechnol. 2008, 26, 1011-1013.
T. Chen, C. Guestrin, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD'16, ACM Press, New York, New York, USA, 2016, pp. 785-794.
C. Bustamante, L. Garrido, R. Soto, in MICAI 2006 Adv. Artif. Intell. MICAI 2006 Lect. Notes Comput. Sci., Springer, Berlin, Heidelberg, 2006, pp. 237-247.
M. A. jabbar, B. L. Deekshatulu, P. Chandra, Procedia Technol. 2013, 10, 85-94.
J. S. Cramer, The Origins of Logistic Regression, 119, Tinbergen Institute, 2002, 167-178.
L. Breiman, Leo, Mach. Learn. 2001, 45, 5-32.
C. Cortes, V. Vapnik, Mach. Learn. 1995, 20, 273-297.
E. Frank, M. A. Hall, I. H. Witten, in Data Min., Elsevier, 2017, pp. 553-571.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, J. Mach. Learn. Res. 2011, 12, 2825-2830.
R. Batuwita, V. Palade, J. Bioinform. Comput. Biol. 2012, 10, 1250003.
J. Han, M. Kamber, J. Pei, Introduction, Elsevier, USA, 2012.
J. Jiang, N. Wang, P. Chen, J. Zhang, B. Wang, BioMed Res. Int. 2017, 2017, 1-10.
F. Esposito, D. Malerba, G. Semeraro, V. Tamma, Appl. Stoch. Model. Bus. Ind. 1999, 15, 277-299.
J. Zhang, M. Zhu, P. Chen, B. Wang, Neurocomputing 2017, 228, 256-262.
M. Kanehisa, Nucleic Acids Res. 2006, 34, D354-D357.