TamGen: drug design with target-aware molecule generation through a chemical language model.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
29 Oct 2024
29 Oct 2024
Historique:
received:
15
03
2024
accepted:
14
10
2024
medline:
30
10
2024
pubmed:
30
10
2024
entrez:
30
10
2024
Statut:
epublish
Résumé
Generative drug design facilitates the creation of compounds effective against pathogenic target proteins. This opens up the potential to discover novel compounds within the vast chemical space and fosters the development of innovative therapeutic strategies. However, the practicality of generated molecules is often limited, as many designs focus on a narrow set of drug-related properties, failing to improve the success rate of subsequent drug discovery process. To overcome these challenges, we develop TamGen, a method that employs a GPT-like chemical language model and enables target-aware molecule generation and compound refinement. We demonstrate that the compounds generated by TamGen have improved molecular quality and viability. Additionally, we have integrated TamGen into a drug discovery pipeline and identified 14 compounds showing compelling inhibitory activity against the Tuberculosis ClpP protease, with the most effective compound exhibiting a half maximal inhibitory concentration (IC
Identifiants
pubmed: 39472567
doi: 10.1038/s41467-024-53632-4
pii: 10.1038/s41467-024-53632-4
doi:
Substances chimiques
Endopeptidase Clp
EC 3.4.21.92
Antitubercular Agents
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
9360Informations de copyright
© 2024. The Author(s).
Références
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).
doi: 10.1038/nrd1799
Wang, M. et al. Deep learning approaches for de novo drug design: an overview. Curr. Opin. Struct. Biol. 72, 135–144 (2022).
doi: 10.1016/j.sbi.2021.10.001
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter Baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).
doi: 10.1038/s41589-023-01349-8
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–70213 (2020).
doi: 10.1016/j.cell.2020.01.021
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature https://doi.org/10.1038/s41586-023-06887-8 (2023).
Stanley, M. & Segler, M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr. Opin. Struct. Biol. 82, 102658 (2023).
doi: 10.1016/j.sbi.2023.102658
Corsello, S. M. et al. The drug repurposing hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
doi: 10.1038/nm.4306
Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, 1373–1380 (2022).
doi: 10.1093/nar/gkac956
Irwin, J. J. & Shoichet, B. K. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182 (2005).
doi: 10.1021/ci049714+
Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).
doi: 10.1021/ar500432k
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
doi: 10.1021/acscentsci.7b00512
Liu, M., Luo, Y., Uchino, K., Maruhashi, K., Ji, S. Generating 3D molecules for target protein binding. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 13912–13924 (PMLR, 2022).
Feng, W. et al. Generation of 3d molecules in pockets via a language model. Nat. Mach. Intell. 6, 62–73 (2024).
doi: 10.1038/s42256-023-00775-6
Jiang, Y. et al. Pocketflow is a data-and-knowledge-driven structure-based molecular generative model. Nat. Mach. Intell. 6, 326–337 (2024).
doi: 10.1038/s42256-024-00808-8
Qian, H., Lin, C., Zhao, D., Tu, S. & Xu, L. AlphaDrug: protein target specific de novo molecular generation. PNAS Nexus 1, 227 (2022).
doi: 10.1093/pnasnexus/pgac227
Prykhodko, O. et al. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. 11, 74 (2019).
doi: 10.1186/s13321-019-0397-9
Skalic, M., Jiménez, J., Sabbadin, D. & De Fabritiis, G. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model. 59, 1205–1214 (2019).
doi: 10.1021/acs.jcim.8b00706
Zhung, W., Kim, H. & Kim, W. Y. 3d molecular generative framework for interaction-guided drug design. Nat. Commun. 15, 2688 (2024).
doi: 10.1038/s41467-024-47011-2
Guan, J., Qian, W.W., Peng, X., Su, Y., Peng, J., Ma, J. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. The Eleventh International Conference on Learning Representations (2023).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695 (2022).
Lin, H. et al. Functional-group-based diffusion for pocket-specific molecule generation and elaboration. In: Thirty-seventh Conference on Neural Information Processing Systems https://openreview.net/forum?id=lRG11M91dx (2023).
Qian, H., Huang, W., Tu, S. & Xu, L. KGDiff: towards explainable target-aware molecule generation with knowledge guidance. Brief. Bioinforma. 25, 435 (2023).
doi: 10.1093/bib/bbad435
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
doi: 10.1038/s41573-019-0050-3
Achiam, J. et al. Gpt-4 technical report. ArXiv https://arxiv.org/abs/2303.08774 (2024).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 5998–6008 (2017).
OpenAI GPT-4V(ision) System Card https://cdn.openai.com/papers/GPTV_System_Card.pdf (2023).
Radford, A. et al. Robust speech recognition via large-scale weak supervision. ICML’23 (2023).
AI4Science, M.R., Quantum, M.A. The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 (2023).
Weininger, D. Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
doi: 10.1021/ci00057a005
Organization, W.H. Fact sheets of Tuberculosis from WHO https://www.who.int/news-room/fact-sheets/detail/tuberculosis (2023).
Dartois, V. A. & Rubin, E. J. Anti-tuberculosis treatment strategies and drug development: challenges and priorities. Nat. Rev. Microbiol. 20, 685–701 (2022).
doi: 10.1038/s41579-022-00731-y
Organization, W.H. Global tuberculosis report 2023 https://www.who.int/publications/i/item/9789240083851 (2023).
Waller, N. J., Cheung, C.-Y., Cook, G. M. & McNeil, M. B. The evolution of antibiotic resistance is associated with collateral drug phenotypes in mycobacterium tuberculosis. Nat. Commun. 14, 1517 (2023).
doi: 10.1038/s41467-023-37184-7
d’Andrea, F. B. et al. The essential <i>m. tuberculosis</i> clp protease is functionally asymmetric in vivo. Sci. Adv. 8, 7943 (2022).
doi: 10.1126/sciadv.abn7943
Culp, E. & Wright, G. D. Bacterial proteases, untapped antimicrobial drug targets. J. Antibiot. 70, 366–377 (2017).
doi: 10.1038/ja.2016.138
Maia, E. H. B., Assis, L. C., De Oliveira, T. A., Da Silva, A. M. & Taranto, A. G. Structure-based virtual screening: from classical to artificial intelligence. Front. Chem. 8, 343 (2020).
doi: 10.3389/fchem.2020.00343
Benaroudj, N., Raynal, B., Miot, M. & Ortiz-Lombardia, M. Assembly and proteolytic processing of mycobacterial clpp1 and clpp2. BMC Biochem. 12, 61 (2011).
doi: 10.1186/1471-2091-12-61
Kingma, D.P., Welling, M. Auto-encoding variational bayes. International Conference on Learning Representations (2014).
Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).
Luo, S., Guan, J., Ma, J., Peng, J. A 3d generative model for structure-based drug design. Adv. Neural Inform. Process. Syst. 34, 6229–6239 (2021).
Peng, X. et al. Pocket2mol: efficient molecular sampling based on 3d protein pockets. International Conference on Machine Learning (2022).
Zhang, O. et al. Resgen is a pocket-aware 3d molecular generation model based on parallel multiscale modelling. Nat. Mach. Intell. 5, 1020–1030 (2023).
doi: 10.1038/s42256-023-00712-7
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
doi: 10.1021/acs.jcim.0c00411
Trott, O. & Olson, A. J. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
doi: 10.1002/jcc.21334
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
doi: 10.1038/nchem.1243
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
doi: 10.1016/S0169-409X(96)00423-1
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
doi: 10.1186/1758-2946-1-8
Piccaro, G., Poce, G., Biava, M., Giannoni, F. & Fattorini, L. Activity of lipophilic and hydrophilic drugs against dormant and replicating mycobacterium tuberculosis. J. Antibiotics 68, 711–714 (2015).
doi: 10.1038/ja.2015.52
Skoraczyński, G., Kitlas, M., Miasojedow, B. & Gambin, A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J. Cheminform. 15, 6 (2023).
doi: 10.1186/s13321-023-00678-z
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 1–11 (2009).
doi: 10.1186/1758-2946-1-8
Peng, X., Guan, J., Liu, Q., Ma, J. Moldiff: addressing the atom-bond inconsistency problem in 3d molecule diffusion generation. ICML’23 (2023).
Ritchie, T. J. & Macdonald, S. J. F. The impact of aromatic ring count on compound developability - are too many aromatic rings a liability in drug design? Drug Discov. Today 14, 1011–1020 (2009).
doi: 10.1016/j.drudis.2009.07.014
Moreira, W. et al. Target mechanism-based whole-cell screening identifies bortezomib as an inhibitor of caseinolytic protease in mycobacteria. MBio 6, 10–1128 (2015).
doi: 10.1128/mBio.00253-15
Moreira, W., Santhanakrishnan, S., Dymock, B. W. & Dick, T. Bortezomib warhead-switch confers dual activity against mycobacterial caseinolytic protease and proteasome and selectivity against human proteasome. Front. Microbiol. 8, 746 (2017).
doi: 10.3389/fmicb.2017.00746
Guo, J., Liu, Q., Guo, H., Lu, X. Ligandformer: a graph neural network for predicting compound property with robust interpretation. arXiv preprint arXiv:2202.10873 (2022).
Coghi, P. S., Zhu, Y., Xie, H., Hosmane, N. S. & Zhang, Y. Organoboron compounds: effective antibacterial and antiparasitic agents. Molecules 26, 3309 (2021).
doi: 10.3390/molecules26113309
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature https://doi.org/10.1038/s41586-021-03819-2 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500 (2024).
doi: 10.1038/s41586-024-07487-w
Lyu, J. et al. Alphafold2 structures guide prospective ligand discovery. Science 384, 6354 (2024).
doi: 10.1126/science.adn6354
Gao, Z., Hu, Y., Tan, C., Li, S.Z. PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix Embedding https://arxiv.org/abs/2302.07120 (2023).
Zhu, J. et al. Direct molecular conformation generation. Trans. Mach. Learn. Res. https://openreview.net/forum?id=lCPOHiztuw (2022).
Kingma, D.P., Ba, J. Adam: a method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015).
Lane, T. et al. Comparing and validating machine learning models for mycobacterium tuberculosis drug discovery. Mol. Pharm. 15, 4346–4360 (2018).
doi: 10.1021/acs.molpharmaceut.8b00083
Radev, D.R., Qi, H., Wu, H., Fan, W. Evaluating web-based question answering systems. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02) (2002).
Akopian, T. et al. Cleavage specificity of mycobacterium tuberculosis ClpP1P2 protease and identification of novel peptide substrates and boronate inhibitors with anti-bacterial activity. J. Biol. Chem. 290, 11008–11020 (2015).
doi: 10.1074/jbc.M114.625640
Fraga, H. et al. Development of high throughput screening methods for inhibitors of ClpC1P1P2 from mycobacteria tuberculosis. Anal. Biochem. 567, 30–37 (2019).
doi: 10.1016/j.ab.2018.12.004
Li, M. et al. Structure and functional properties of the active form of the proteolytic complex, ClpP1P2, from mycobacterium tuberculosis. J. Biol. Chem. 291, 7465–7476 (2016).
doi: 10.1074/jbc.M115.700344
Hu, G. et al. Structure of the mycobacterium tuberculosis proteasome and mechanism of inhibition by a peptidyl boronate. Mol. Microbiol. 59, 1417–1428 (2006).
doi: 10.1111/j.1365-2958.2005.05036.x
Lin, G., Tsu, C., Dick, L., Zhou, X. K. & Nathan, C. Distinct specificities of mycobacterium tuberculosis and mammalian proteasomes for n-acetyl tripeptide substrates. J. Biol. Chem. 283, 34423–34431 (2008).
doi: 10.1074/jbc.M805324200
McInnes, L., Healy, J., Saul, N. & Groβberger, L. Umap: uniform manifold approximation and projection. Journal of Open Source Software. 3, 861 (2018).
Wu, K. et al. Pre-trained model weights and data of tamgen (1.0) https://doi.org/10.5281/zenodo.13751391 (2024).