MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
08 05 2023
08 05 2023
Historique:
medline:
9
5
2023
pubmed:
15
4
2023
entrez:
14
4
2023
Statut:
ppublish
Résumé
High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call
Identifiants
pubmed: 37058588
doi: 10.1021/acs.jcim.2c01569
pmc: PMC10170507
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2667-2678Références
BioData Min. 2021 Feb 4;14(1):13
pubmed: 33541410
Chem Sci. 2018 Jun 6;9(24):5441-5451
pubmed: 30155234
J Chem Inf Model. 2016 Feb 22;56(2):390-8
pubmed: 26898267
Chem Sci. 2017 Oct 31;9(2):513-530
pubmed: 29629118
J Med Chem. 2018 Nov 8;61(21):9442-9468
pubmed: 29920198
Nat Rev Drug Discov. 2011 Mar;10(3):188-95
pubmed: 21358738
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273
pubmed: 32282202
J Cheminform. 2019 Aug 8;11(1):54
pubmed: 31396716
Sci Data. 2020 May 1;7(1):134
pubmed: 32358545
Sci Data. 2022 Jun 7;9(1):273
pubmed: 35672335
J Chem Inf Model. 2022 Oct 10;62(19):4660-4671
pubmed: 36112568
Nat Commun. 2019 Jul 1;10(1):2903
pubmed: 31263102
J Med Chem. 2010 Apr 8;53(7):2986-97
pubmed: 20235539
PeerJ Comput Sci. 2021 Jul 5;7:e623
pubmed: 34307865
ACS Chem Biol. 2012 Aug 17;7(8):1399-409
pubmed: 22594495
Nucleic Acids Res. 2015 Jul 1;43(W1):W612-20
pubmed: 25883136
J Chem Phys. 2020 Sep 28;153(12):124111
pubmed: 33003742
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940
pubmed: 30398643
J Chem Inf Model. 2019 Mar 25;59(3):962-972
pubmed: 30408959
Cell. 2020 Feb 20;180(4):688-702.e13
pubmed: 32084340
J Chem Inf Model. 2017 Jun 26;57(6):1300-1308
pubmed: 28481528