Merging bioactivity predictions from cell morphology and chemical fingerprint models using similarity to training data.

Applicability domain Bioactivity Cell Painting Machine learning Structure Toxicity

Journal

Journal of cheminformatics
ISSN: 1758-2946
Titre abrégé: J Cheminform
Pays: England
ID NLM: 101516718

Informations de publication

Date de publication:
02 Jun 2023
Historique:
received: 05 02 2023
accepted: 20 04 2023
medline: 3 6 2023
pubmed: 3 6 2023
entrez: 2 6 2023
Statut: epublish

Résumé

The applicability domain of machine learning models trained on structural fingerprints for the prediction of biological endpoints is often limited by the lack of diversity of chemical space of the training data. In this work, we developed similarity-based merger models which combined the outputs of individual models trained on cell morphology (based on Cell Painting) and chemical structure (based on chemical fingerprints) and the structural and morphological similarities of the compounds in the test dataset to compounds in the training dataset. We applied these similarity-based merger models using logistic regression models on the predictions and similarities as features and predicted assay hit calls of 177 assays from ChEMBL, PubChem and the Broad Institute (where the required Cell Painting annotations were available). We found that the similarity-based merger models outperformed other models with an additional 20% assays (79 out of 177 assays) with an AUC > 0.70 compared with 65 out of 177 assays using structural models and 50 out of 177 assays using Cell Painting models. Our results demonstrated that similarity-based merger models combining structure and cell morphology models can more accurately predict a wide range of biological assay outcomes and further expanded the applicability domain by better extrapolating to new structural and morphology spaces.

Identifiants

pubmed: 37268960
doi: 10.1186/s13321-023-00723-x
pii: 10.1186/s13321-023-00723-x
pmc: PMC10236827
doi:

Types de publication

Journal Article

Langues

eng

Pagination

56

Subventions

Organisme : Cambridge Centre for Data Driven Discovery and Accelerate Programme for Scientific Discovery
ID : Theoretical Scientific and Philosophical Perspectives on Biological Understanding in the Age of Artificial Intelligence
Organisme : Swedish Research Council
ID : 2020-03731
Organisme : FORMAS
ID : 2018-00924

Informations de copyright

© 2023. The Author(s).

Références

Nat Methods. 2022 Dec;19(12):1550-1557
pubmed: 36344834
Chem Res Toxicol. 2021 Feb 15;34(2):422-437
pubmed: 33522793
Gigascience. 2017 Dec 1;6(12):1-5
pubmed: 28327978
Mol Biol Cell. 2022 May 15;33(6):ar49
pubmed: 35353015
ACS Chem Biol. 2012 Aug 17;7(8):1399-409
pubmed: 22594495
Cell Chem Biol. 2021 Jun 17;28(6):848-854.e5
pubmed: 33567254
Cancer Genomics Proteomics. 2008 Jan-Feb;5(1):63-78
pubmed: 18359981
Cell Chem Biol. 2022 Jun 16;29(6):1053-1064.e3
pubmed: 34968420
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1912-28
pubmed: 15554660
J Chem Inf Model. 2006 Nov-Dec;46(6):2445-56
pubmed: 17125186
J Cheminform. 2017 Aug 3;9(1):44
pubmed: 29086213
J Chem Inf Model. 2005 Jul-Aug;45(4):839-49
pubmed: 16045276
J Chem Inf Model. 2020 Jan 27;60(1):22-28
pubmed: 31860296
J Chem Inf Model. 2019 Mar 25;59(3):1163-1171
pubmed: 30840449
Nat Protoc. 2016 Sep;11(9):1757-74
pubmed: 27560178
RSC Chem Biol. 2021 Dec 22;3(2):170-200
pubmed: 35360890
Nat Commun. 2023 Apr 8;14(1):1967
pubmed: 37031208
Bioinformatics. 2009 Apr 15;25(8):1091-3
pubmed: 19237447
Chem Res Toxicol. 2020 Feb 17;33(2):353-366
pubmed: 31975586
SAR QSAR Environ Res. 2010 Jan 1;21(1):127-48
pubmed: 20373217
J Chem Inf Model. 2014 Mar 24;54(3):793-800
pubmed: 24494696
Methods Mol Biol. 2018;1800:141-169
pubmed: 29934891
Nat Rev Drug Discov. 2021 Feb;20(2):145-159
pubmed: 33353986
F1000Res. 2021 May 18;10:
pubmed: 34164109
ACS Chem Biol. 2022 Jul 15;17(7):1733-1744
pubmed: 35793809
Commun Biol. 2022 Aug 23;5(1):858
pubmed: 35999457
J Cheminform. 2015 May 20;7:20
pubmed: 26052348
SLAS Discov. 2023 Apr;28(3):53-64
pubmed: 36639032
Drug Discov Today. 2021 Apr;26(4):1040-1052
pubmed: 33508423
Cell Syst. 2022 Nov 16;13(11):911-923.e9
pubmed: 36395727
PLoS Biol. 2018 Jul 3;16(7):e2005970
pubmed: 29969450
Genome Res. 2003 Nov;13(11):2498-504
pubmed: 14597658
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940
pubmed: 30398643
Sci Rep. 2020 Aug 6;10(1):13262
pubmed: 32764586
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613
pubmed: 30476243
Biom J. 2005 Aug;47(4):458-72
pubmed: 16161804
Chem Biol. 1995 Feb;2(2):107-18
pubmed: 9383411
J Chem Inf Model. 2020 Jun 22;60(6):2830-2837
pubmed: 32374618
Nat Biotechnol. 2020 Sep;38(9):1087-1096
pubmed: 32440005
PLoS Comput Biol. 2022 Feb 25;18(2):e1009888
pubmed: 35213530
Assay Drug Dev Technol. 2007 Feb;5(1):127-36
pubmed: 17355205
Toxicol Res (Camb). 2016 Mar 3;5(3):883-894
pubmed: 30090397

Auteurs

Srijit Seal (S)

Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.

Hongbin Yang (H)

Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.

Maria-Anna Trapotsi (MA)

Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.

Satvik Singh (S)

Department of Applied Mathematics and Theoretical Physics (DAMTP), University of Cambridge, Cambridge, UK.

Jordi Carreras-Puigvert (J)

Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

Ola Spjuth (O)

Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden. ola.spjuth@farmbio.uu.se.

Andreas Bender (A)

Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK. ab454@cam.ac.uk.

Classifications MeSH