Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
09 2020
09 2020
Historique:
received:
23
08
2019
accepted:
27
03
2020
pubmed:
23
5
2020
medline:
11
11
2020
entrez:
23
5
2020
Statut:
ppublish
Résumé
Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on ~800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods.
Identifiants
pubmed: 32440005
doi: 10.1038/s41587-020-0502-7
pii: 10.1038/s41587-020-0502-7
doi:
Substances chimiques
Biological Products
0
Biomarkers, Pharmacological
0
Pharmaceutical Preparations
0
Small Molecule Libraries
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1087-1096Commentaires et corrections
Type : ErratumIn
Références
Sterling, T. & Irwin, J. J. ZINC 15—ligand discovery for everyone. J. Chem. Inform. Model. 55, 2324–2337 (2015).
doi: 10.1021/acs.jcim.5b00559
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).
pubmed: 27899562
doi: 10.1093/nar/gkw1074
Wang, Y. et al. PubChem BioAssay: 2017 update. Nucleic Acids Res. 45, D955–D963 (2017).
pubmed: 27899599
doi: 10.1093/nar/gkw1118
Wishart, D. S. Chapter 3: small molecules and disease. PLOS Comput. Biol. 8, e1002805 (2012).
pubmed: 23300405
pmcid: 3531289
doi: 10.1371/journal.pcbi.1002805
Duran-Frigola, M., Rossell, D. & Aloy, P. A chemo-centric view of human health and disease. Nature Commun. 5, 5676 (2014).
doi: 10.1038/ncomms6676
Rouillard, A. D. et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, baw100–baw100 (2016).
pubmed: 27374120
pmcid: 4930834
doi: 10.1093/database/baw100
Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs from 1981 to 2014. J. Nat. Prod. 79, 629–661 (2016).
pubmed: 26852623
doi: 10.1021/acs.jnatprod.5b01055
Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531–541 (2016).
pubmed: 27219696
doi: 10.1038/nchem.2479
Welsch, M. E., Snyder, S. A. & Stockwell, B. R. Privileged scaffolds for library design and drug discovery. Curr. Opin. Chem. Biol. 14, 347–361 (2010).
pubmed: 20303320
pmcid: 2908274
doi: 10.1016/j.cbpa.2010.02.018
Bleicher, K. H., Böhm, H.-J., Müller, K. & Alanine, A. I. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug Disc. 2, 369–378 (2003).
doi: 10.1038/nrd1086
Holbeck, S. L., Collins, J. M. & Doroshow, J. H. Analysis of food and drug administration–approved anticancer agents in the NCI60 panel of human tumor cell lines. Mol. Cancer Therap. 9, 1451–1460 (2010).
doi: 10.1158/1535-7163.MCT-10-0106
Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov. 5, 1210–1223 (2015).
pubmed: 26482930
pmcid: 4631646
doi: 10.1158/2159-8290.CD-15-0235
Campillos, M., Kuhn, M., Gavin, A.-C., Jensen, L. J. & Bork, P. Drug target identification using side-effect similarity. Science 321, 263–366 (2008).
pubmed: 18621671
doi: 10.1126/science.1158140
Petrone, P. M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).
pubmed: 22594495
doi: 10.1021/cb3001028
Papadatos, G., Gaulton, A., Hersey, A. & Overington, J. P. Activity, assay and target data curation and quality in the ChEMBL database. J. Comput. Aided Mol. Des. 29, 885–896 (2015).
pubmed: 26201396
pmcid: 4607714
doi: 10.1007/s10822-015-9860-5
Duran-Frigola, M., Mateo, L. & Aloy, P. Drug repositioning beyond the low-hanging fruits. Curr. Opin. Syst. Biol. 3, 95–102 (2017).
doi: 10.1016/j.coisb.2017.04.010
Nguyen, D. T. et al. Pharos: collating protein information to shed light on the druggable genome. Nucleic Acids Res. 45, D995–D1002 (2017).
pubmed: 27903890
doi: 10.1093/nar/gkw1072
Duran-Frigola, M., Fernandez-Torras, A., Bertoni, M. & Aloy, P. Formatting biological big data for modern machine learning in drug discovery. WIREs Comp. Mol. Sci. 9, e1408 (2018).
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
pubmed: 28388612
pmcid: 5568558
doi: 10.1038/nm.4306
Jokinen, E. & Koivunen, J. P. MEK and PI3K inhibition in solid tumors: rationale and evidence to date. Ther. Adv. Med. Oncol. 7, 170–180 (2015).
pubmed: 26673580
pmcid: 4406912
doi: 10.1177/1758834015571111
Lamb, J. et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
doi: 10.1126/science.1132939
pubmed: 17008526
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
pubmed: 29195078
pmcid: 5990023
doi: 10.1016/j.cell.2017.10.049
Filzen, T. M., Kutchukian, P. S., Hermes, J. D., Li, J. & Tudor, M. Representing high throughput expression profiles via perturbation barcodes reveals compound targets. PLoS Comput. Biol. 13, e1005335 (2017).
pubmed: 28182661
pmcid: 5300121
doi: 10.1371/journal.pcbi.1005335
Chen, B. et al. Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets. Nat. Commun. 8, 16022 (2017).
pubmed: 28699633
pmcid: 5510182
doi: 10.1038/ncomms16022
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
pubmed: 27397505
pmcid: 4967469
doi: 10.1016/j.cell.2016.06.017
Encinas, M. et al. Sequential treatment of SH-SY5Y cells with retinoic acid and brain-derived neurotrophic factor gives rise to fully differentiated, neurotrophic factor-dependent, human neuron-like cells. J. Neurochem. 75, 991–1003 (2000).
pubmed: 10936180
doi: 10.1046/j.1471-4159.2000.0750991.x
Tanzi, R. E. The genetics of Alzheimer disease. Cold Spring Harb. Perspect. Med. 2, a006296 (2012).
pubmed: 23028126
pmcid: 3475404
doi: 10.1101/cshperspect.a006296
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).
pubmed: 30462303
doi: 10.1093/nar/gky1133
Perszyk, R. E. et al. GluN2D-containing N-methyl-D-aspartate receptors mediate synaptic transmission in hippocampal interneurons and regulate interneuron activityity. Mol. Pharmacol. 90, 689–702 (2016).
pubmed: 27625038
pmcid: 5118640
doi: 10.1124/mol.116.105130
Harold, D. et al. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nat Genet 41, 1088–1093 (2009).
pubmed: 19734902
pmcid: 2845877
doi: 10.1038/ng.440
Anselmo, A. C., Gokarn, Y. & Mitragotri, S. Non-invasive delivery strategies for biologics. Nat. Rev. Drug Discov. 18, 19–40 (2018).
pubmed: 30498202
doi: 10.1038/nrd.2018.183
Depper, J. M., Leonard, W. J., Robb, R. J., Waldmann, T. A. & Greene, W. C. Blockade of the interleukin-2 receptor by anti-Tac antibody: inhibition of human lymphocyte activation. J. Immunol. 131, 690–696 (1983).
pubmed: 6408186
Benson, J. M. et al. Therapeutic targeting of the IL-12/23 pathways: generation and characterization of ustekinumab. Nat. Biotechnol. 29, 615–624 (2011).
pubmed: 21747388
doi: 10.1038/nbt.1903
Reddy, M. et al. Modulation of CLA, IL-12R, CD40L, and IL-2Ralpha expression and inhibition of IL-12- and IL-23-induced cytokine secretion by CNTO 1275. Cell Immunol. 247, 1–11 (2007).
pubmed: 17761156
doi: 10.1016/j.cellimm.2007.06.006
Xu, M. J., Johnson, D. E. & Grandis, J. R. EGFR-targeted therapies in the post-genomic era. Cancer Metastasis Rev. 36, 463–473 (2017).
pubmed: 28866730
pmcid: 5693744
doi: 10.1007/s10555-017-9687-8
Masuelli, L. et al. Apigenin induces apoptosis and impairs head and neck carcinomas EGFR/ErbB2 signaling. Front. Biosci. 16, 1060–1068 (2011).
doi: 10.2741/3735
Hu, W. J., Liu, J., Zhong, L. K. & Wang, J. Apigenin enhances the antitumor effects of cetuximab in nasopharyngeal carcinoma by inhibiting EGFR signaling. Biomed. Pharmacother. 102, 681–688 (2018).
pubmed: 29604587
doi: 10.1016/j.biopha.2018.03.111
Sawai, A. et al. Inhibition of Hsp90 down-regulates mutant epidermal growth factor receptor (EGFR) expression and sensitizes EGFR mutant tumors to paclitaxel. Cancer Res. 68, 589–596 (2008).
pubmed: 18199556
pmcid: 4011195
doi: 10.1158/0008-5472.CAN-07-1570
Williams, A. J. et al. Open PHACTS: semantic interoperability for drug discovery. Drug Disc. Today 17, 1188–1198 (2012).
doi: 10.1016/j.drudis.2012.05.016
Rodgers, G. et al. Glimmers in illuminating the druggable genome. Nat. Rev. Drug Disc. 17, 301–302 (2018).
doi: 10.1038/nrd.2017.252
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
pubmed: 29629118
doi: 10.1039/C7SC02664A
Lee, Y. S. et al. A computational framework for genome-wide characterization of the human disease landscape. Cell Syst. 8, 152–162 (2019).
pubmed: 30685436
pmcid: 7374759
doi: 10.1016/j.cels.2018.12.010
Mendez-Lucio, O., Baillif, B., Clevert, D. A., Rouquie, D. & Wichard, J. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 11, 10 (2020).
pubmed: 31900408
pmcid: 6941972
doi: 10.1038/s41467-019-13807-w
Reymond, J.-L. The Chemical Space Project. Acc. Chem. Res. 48, 722–730 (2015).
pubmed: 25687211
doi: 10.1021/ar500432k
Irwin, J. J., Gaskins, G., Sterling, T., Mysinger, M. M. & Keiser, M. J. Predicted biological activity of purchasable chemical space. J. Chem. Info. Modeling 58, 148–164 (2018).
doi: 10.1021/acs.jcim.7b00316
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11, 333–337 (2014).
pubmed: 24464287
doi: 10.1038/nmeth.2810
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
pubmed: 22270643
pmcid: 3524573
doi: 10.1038/nchem.1243
Axen, S. D. et al. A Sisimple representation of three-dimensional molecular structure. J. Med. Chem. 60, 7393–7409 (2017).
pubmed: 28731335
pmcid: 6075869
doi: 10.1021/acs.jmedchem.7b00696
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
pubmed: 8709122
doi: 10.1021/jm9602928
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
pubmed: 12444722
doi: 10.1021/ci010132r
Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).
doi: 10.1016/j.ddtec.2004.11.007
pubmed: 24981612
Congreve, M., Carr, R., Murray, C. & Jhoti, H. A ‘rule of three’ for fragment-based lead discovery? Drug Discov. Today 8, 876–877 (2003).
pubmed: 14554012
doi: 10.1016/S1359-6446(03)02831-9
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
doi: 10.1093/nar/gkx1037
pubmed: 29126136
Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol. 10, e1003926 (2014).
pubmed: 25474468
pmcid: 4256011
doi: 10.1371/journal.pcbi.1003926
Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 44, D1045–D1053 (2016).
pubmed: 26481362
doi: 10.1093/nar/gkv1072
Hastings, J. et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2016).
pubmed: 26467479
doi: 10.1093/nar/gkv1031
Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 31, 419–425 (2013).
pubmed: 23455439
doi: 10.1038/nbt.2488
Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2011).
pubmed: 21071392
doi: 10.1093/nar/gkq1039
Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
doi: 10.1093/nar/gkx1132
pubmed: 29145629
Pryszcz, L. P., Huerta-Cepas, J. & Gabaldon, T. MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res. 39, e32 (2011).
pubmed: 21149260
doi: 10.1093/nar/gkq953
Kruger, F. A. & Overington, J. P. Global analysis of small molecule binding to related protein targets. PLoS Comput. Biol. 8, e1002333 (2012).
pubmed: 22253582
pmcid: 3257267
doi: 10.1371/journal.pcbi.1002333
Zwierzyna, M. & Overington, J. P. Classification and analysis of a large collection of in vivo bioassay descriptions. PLoS Comput. Biol. 13, e1005641 (2017).
pubmed: 28678787
pmcid: 5517062
doi: 10.1371/journal.pcbi.1005641
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).
doi: 10.1093/nar/gkw937
pubmed: 27924014
Li, T. et al. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017).
pubmed: 27892958
doi: 10.1038/nmeth.4083
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
doi: 10.1093/nar/gkv1070
pubmed: 26476454
Kandasamy, K. et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 11, R3 (2010).
pubmed: 20067622
pmcid: 2847715
Mi, H. et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 45, D183–D189 (2017).
pubmed: 27899595
doi: 10.1093/nar/gkw1138
Kelder, T. et al. WikiPathways: building research communities on biological pathways. Nucleic Acids Res. 40, D1301–D1307 (2012).
pubmed: 22096230
doi: 10.1093/nar/gkr1074
Mosca, R., Ceol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).
pubmed: 23399932
doi: 10.1038/nmeth.2289
Leiserson, M. D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
pubmed: 25501392
doi: 10.1038/ng.3168
Iorio, F. et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc. Natl Acad. Sci. USA 107, 14621–14626 (2010).
pubmed: 20679242
pmcid: 2930479
doi: 10.1073/pnas.1000138107
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
pubmed: 22460905
pmcid: 3320027
doi: 10.1038/nature11003
Basu, A. et al. An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules. Cell 154, 1151–1161 (2013).
pubmed: 23993102
pmcid: 3954635
doi: 10.1016/j.cell.2013.08.003
Chabner, B. A. NCI-60 cell line screening: a radical departure in its time. J. Natl Cancer Inst. 108, djv388 (2016).
pubmed: 26755050
doi: 10.1093/jnci/djv388
Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: what is it and how does it work? Int. J. Meth. Psychiatr. Res. 20, 40–49 (2011).
doi: 10.1002/mpr.329
Nelson, J. et al. MOSAIC: a chemical-genetic interaction data repository and web resource for exploring chemical modes of action. Bioinformatics 34, 1251–1252 (2017).
pmcid: 6031042
doi: 10.1093/bioinformatics/btx732
Wawer, M. J. et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc. Natl Acad. Sci. USA 111, 10911–10916 (2014).
pubmed: 25024206
pmcid: 4121832
doi: 10.1073/pnas.1410933111
Brown, A. S. & Patel, C. J. A standard database for drug repositioning. Sci. Data 4, 170029 (2017).
pubmed: 28291243
pmcid: 5349249
doi: 10.1038/sdata.2017.29
Piñero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45, D833–D839 (2017).
doi: 10.1093/nar/gkw943
pubmed: 27924018
Kuhn, M., Letunic, I., Jensen, L. J. & Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res. 44, D1075–1079 (2016).
pubmed: 26481350
doi: 10.1093/nar/gkv1075
Kuhn, M. et al. Systematic identification of proteins that elicit drug side effects. Mol. Syst. Biol. 9, 663 (2013).
pubmed: 23632385
pmcid: 3693830
doi: 10.1038/msb.2013.10
Duran-Frigola, M. & Aloy, P. Analysis of chemical and biological features yields mechanistic insights into drug side effects. Chem. Biol. 20, 594–603 (2013).
pubmed: 23601648
doi: 10.1016/j.chembiol.2013.03.017
Davis, A. P. et al. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res. 45, D972–D978 (2017).
pubmed: 27651457
doi: 10.1093/nar/gkw838
Ryu, J. Y., Kim, H. W. & Lee, S. Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl Acad. Sci. USA 115, 4304–4311 (2018).
doi: 10.1073/pnas.1803294115
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Preprint at https://arxiv.org/abs/1607.00653 (2016).
Matsui, Y. O., Yamasaki, K. & Aizawa, T. K PQk-means: billion-scale clustering for product-quantized codes. Preprint at https://arxiv.org/abs/1709.03708 (2017).
Maaten, L. v. d. Barnes–Hut-SNE. Preprint at https://arxiv.org/abs/1301.3342 (2013).
McInnes, L. & Healy, J. Accelerated hierarchical density based clustering. Proc. 2017 IEEE International Conference on Data Mining Workshops (IEEE, 2017).
Webber, W., Moffat, A. & Zobel, J. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 1–38 (2010).
doi: 10.1145/1852102.1852106
Lo, Y. C. et al. Large-scale chemical similarity networks for target profiling of compounds identified in cell-based chemical screens. PLoS Comput. Biol. 11, e1004153 (2015).
pubmed: 25826798
pmcid: 4380459
doi: 10.1371/journal.pcbi.1004153
Rennie, J. D. M., Shih, L., Teevan, J. & Karger, D. R. Tackling the poor assumptions of naive Bayes text classifiers. Proc. International Conference on International Conference on Machine Learning 616–623 (AAAI Press, 2003).
Irwin, J. J. & Shoichet, B. K. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model 45, 177–182 (2005).
pubmed: 15667143
pmcid: 1360656
doi: 10.1021/ci049714+
Fernandez-Torras, A., Duran-Frigola, M. & Aloy, P. Encircling the regions of the pharmacogenomic landscape that determine drug response. Genome Med. 11, 17 (2019).
Badia, R. et al. SAMHD1 is active in cycling cells permissive to HIV-1 infection. Antiviral Res. 142, 123–135 (2017).
pubmed: 28359840
doi: 10.1016/j.antiviral.2017.03.019
Saxena, V., Orgill, D. & Kohane, I. Absolute enrichment: gene set enrichment analysis for homeostatic systems. Nucleic Acids Res. 34, e151 (2006).
pubmed: 17130162
pmcid: 1702493
doi: 10.1093/nar/gkl766