Drug Target Identification with Machine Learning: How to Choose Negative Examples.

chemogenomic drug discovery false positive predictions learning bias machine learning negative examples random forests support vector machines target identification

Journal

International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791

Informations de publication

Date de publication:
12 May 2021
Historique:
received: 29 03 2021
revised: 30 04 2021
accepted: 07 05 2021
entrez: 2 6 2021
pubmed: 3 6 2021
medline: 11 6 2021
Statut: epublish

Résumé

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases' statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

Identifiants

pubmed: 34066072
pii: ijms22105118
doi: 10.3390/ijms22105118
pmc: PMC8151112
pii:
doi:

Substances chimiques

Pharmaceutical Preparations 0
Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Vaincre la Mucoviscidose
ID : RF20190502488

Références

J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):1947-58
pubmed: 14632445
Biophys Rep. 2018;4(1):1-16
pubmed: 29577065
Comb Chem High Throughput Screen. 2008 Sep;11(8):677-85
pubmed: 18795887
Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7
pubmed: 24203711
Nat Rev Drug Discov. 2011 Jun 24;10(7):507-19
pubmed: 21701501
Comput Biol Chem. 2011 Dec 14;35(6):353-62
pubmed: 22099632
J Cheminform. 2020 Feb 10;12(1):11
pubmed: 33431042
J Chem Inf Model. 2011 Jul 25;51(7):1593-603
pubmed: 21644501
Radiology. 1982 Apr;143(1):29-36
pubmed: 7063747
J Chem Inf Model. 2010 May 24;50(5):742-54
pubmed: 20426451
Brief Bioinform. 2015 Mar;16(2):325-37
pubmed: 24723570
BMC Bioinformatics. 2007 Aug 17;8:300
pubmed: 17705863
PLoS One. 2018 Oct 4;13(10):e0204999
pubmed: 30286165
Adv Drug Deliv Rev. 2001 Mar 1;46(1-3):3-26
pubmed: 11259830
Nat Rev Drug Discov. 2017 Aug;16(8):531-543
pubmed: 28685762
Bioinformatics. 2008 Oct 1;24(19):2149-56
pubmed: 18676415
J Chem Inf Model. 2011 May 23;51(5):1183-94
pubmed: 21506615
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488
pubmed: 32669436
Curr Top Med Chem. 2017;17(26):2957-2976
pubmed: 28828995
J Mol Biol. 1981 Mar 25;147(1):195-7
pubmed: 7265238
Mol Inform. 2014 Oct;33(10):669-81
pubmed: 27485302
PLoS Comput Biol. 2016 Feb 12;12(2):e1004760
pubmed: 26872142
Bioinformatics. 2004 Jul 22;20(11):1682-9
pubmed: 14988126
Bioinformatics. 2005 Jun;21 Suppl 1:i359-68
pubmed: 15961479
J Chem Inf Model. 2006 Mar-Apr;46(2):626-35
pubmed: 16562992

Auteurs

Matthieu Najm (M)

Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
Institut Curie, 75248 Paris, France.
INSERM U900, 75428 Paris, France.

Chloé-Agathe Azencott (CA)

Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
Institut Curie, 75248 Paris, France.
INSERM U900, 75428 Paris, France.

Benoit Playe (B)

Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
Institut Curie, 75248 Paris, France.
INSERM U900, 75428 Paris, France.

Véronique Stoven (V)

Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
Institut Curie, 75248 Paris, France.
INSERM U900, 75428 Paris, France.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH