Drug Target Identification with Machine Learning: How to Choose Negative Examples.
chemogenomic
drug discovery
false positive predictions
learning bias
machine learning
negative examples
random forests
support vector machines
target identification
Journal
International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791
Informations de publication
Date de publication:
12 May 2021
12 May 2021
Historique:
received:
29
03
2021
revised:
30
04
2021
accepted:
07
05
2021
entrez:
2
6
2021
pubmed:
3
6
2021
medline:
11
6
2021
Statut:
epublish
Résumé
Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases' statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.
Identifiants
pubmed: 34066072
pii: ijms22105118
doi: 10.3390/ijms22105118
pmc: PMC8151112
pii:
doi:
Substances chimiques
Pharmaceutical Preparations
0
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Vaincre la Mucoviscidose
ID : RF20190502488
Références
J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):1947-58
pubmed: 14632445
Biophys Rep. 2018;4(1):1-16
pubmed: 29577065
Comb Chem High Throughput Screen. 2008 Sep;11(8):677-85
pubmed: 18795887
Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7
pubmed: 24203711
Nat Rev Drug Discov. 2011 Jun 24;10(7):507-19
pubmed: 21701501
Comput Biol Chem. 2011 Dec 14;35(6):353-62
pubmed: 22099632
J Cheminform. 2020 Feb 10;12(1):11
pubmed: 33431042
J Chem Inf Model. 2011 Jul 25;51(7):1593-603
pubmed: 21644501
Radiology. 1982 Apr;143(1):29-36
pubmed: 7063747
J Chem Inf Model. 2010 May 24;50(5):742-54
pubmed: 20426451
Brief Bioinform. 2015 Mar;16(2):325-37
pubmed: 24723570
BMC Bioinformatics. 2007 Aug 17;8:300
pubmed: 17705863
PLoS One. 2018 Oct 4;13(10):e0204999
pubmed: 30286165
Adv Drug Deliv Rev. 2001 Mar 1;46(1-3):3-26
pubmed: 11259830
Nat Rev Drug Discov. 2017 Aug;16(8):531-543
pubmed: 28685762
Bioinformatics. 2008 Oct 1;24(19):2149-56
pubmed: 18676415
J Chem Inf Model. 2011 May 23;51(5):1183-94
pubmed: 21506615
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488
pubmed: 32669436
Curr Top Med Chem. 2017;17(26):2957-2976
pubmed: 28828995
J Mol Biol. 1981 Mar 25;147(1):195-7
pubmed: 7265238
Mol Inform. 2014 Oct;33(10):669-81
pubmed: 27485302
PLoS Comput Biol. 2016 Feb 12;12(2):e1004760
pubmed: 26872142
Bioinformatics. 2004 Jul 22;20(11):1682-9
pubmed: 14988126
Bioinformatics. 2005 Jun;21 Suppl 1:i359-68
pubmed: 15961479
J Chem Inf Model. 2006 Mar-Apr;46(2):626-35
pubmed: 16562992