MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
08 05 2023
Historique:
medline: 9 5 2023
pubmed: 15 4 2023
entrez: 14 4 2023
Statut: ppublish

Résumé

High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call

Identifiants

pubmed: 37058588
doi: 10.1021/acs.jcim.2c01569
pmc: PMC10170507
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

2667-2678

Références

BioData Min. 2021 Feb 4;14(1):13
pubmed: 33541410
Chem Sci. 2018 Jun 6;9(24):5441-5451
pubmed: 30155234
J Chem Inf Model. 2016 Feb 22;56(2):390-8
pubmed: 26898267
Chem Sci. 2017 Oct 31;9(2):513-530
pubmed: 29629118
J Med Chem. 2018 Nov 8;61(21):9442-9468
pubmed: 29920198
Nat Rev Drug Discov. 2011 Mar;10(3):188-95
pubmed: 21358738
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273
pubmed: 32282202
J Cheminform. 2019 Aug 8;11(1):54
pubmed: 31396716
Sci Data. 2020 May 1;7(1):134
pubmed: 32358545
Sci Data. 2022 Jun 7;9(1):273
pubmed: 35672335
J Chem Inf Model. 2022 Oct 10;62(19):4660-4671
pubmed: 36112568
Nat Commun. 2019 Jul 1;10(1):2903
pubmed: 31263102
J Med Chem. 2010 Apr 8;53(7):2986-97
pubmed: 20235539
PeerJ Comput Sci. 2021 Jul 5;7:e623
pubmed: 34307865
ACS Chem Biol. 2012 Aug 17;7(8):1399-409
pubmed: 22594495
Nucleic Acids Res. 2015 Jul 1;43(W1):W612-20
pubmed: 25883136
J Chem Phys. 2020 Sep 28;153(12):124111
pubmed: 33003742
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940
pubmed: 30398643
J Chem Inf Model. 2019 Mar 25;59(3):962-972
pubmed: 30408959
Cell. 2020 Feb 20;180(4):688-702.e13
pubmed: 32084340
J Chem Inf Model. 2017 Jun 26;57(6):1300-1308
pubmed: 28481528

Auteurs

David Buterez (D)

Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K.

Jon Paul Janet (JP)

Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 50 Gothenburg, Sweden.

Steven J Kiddle (SJ)

Data Science & Advanced Analytics, Data Science & Artificial Intelligence, R&D, AstraZeneca, Cambridge CB2 8PA, U.K.

Pietro Liò (P)

Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted

Classifications MeSH