Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction.

Compound-protein interactions Incomplete multi-modal representation learning Knowledge graphs Self-supervised learning

Journal

BMC biology
ISSN: 1741-7007
Titre abrégé: BMC Biol
Pays: England
ID NLM: 101190720

Informations de publication

Date de publication:
29 Oct 2024
Historique:
received: 15 07 2024
accepted: 17 10 2024
medline: 29 10 2024
pubmed: 29 10 2024
entrez: 29 10 2024
Statut: epublish

Résumé

Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.

Sections du résumé

BACKGROUND BACKGROUND
Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models.
RESULTS RESULTS
Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins.
CONCLUSIONS CONCLUSIONS
Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.

Identifiants

pubmed: 39468510
doi: 10.1186/s12915-024-02049-y
pii: 10.1186/s12915-024-02049-y
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

248

Informations de copyright

© 2024. The Author(s).

Références

Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.
doi: 10.1038/nrd1468 pubmed: 15286734
Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. 2019;35(2):309–18.
doi: 10.1093/bioinformatics/bty535 pubmed: 29982330
Zhao Q, Yang M, Cheng Z, Li Y, Wang J. Biomedical data and deep learning computational models for predicting compound-protein relations. IEEE/ACM Trans Comput Biol Bioinform. 2021;19(4):2092–110.
doi: 10.1109/TCBB.2021.3069040
Li S, Wan F, Shu H, Jiang T, Zhao D, Zeng J. MONN: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Syst. 2020;10(4):308–22.
doi: 10.1016/j.cels.2020.03.002
Zheng S, Li Y, Chen S, Xu J, Yang Y. Predicting drug-protein interaction using quasi-visual question answering system. Nat Mach Intell. 2020;2(2):134–40.
doi: 10.1038/s42256-020-0152-y
Rube HT, Rastogi C, Feng S, Kribelbauer JF, Li A, Becerra B, et al. Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning. Nat Biotechnol. 2022;40(10):1520–7.
doi: 10.1038/s41587-022-01307-0 pubmed: 35606422 pmcid: 9546773
Kc GB, Bocci G, Verma S, Hassan MM, Holmes J, Yang JJ, et al. A machine learning platform to estimate anti-SARS-CoV-2 activities. Nat Mach Intell. 2021;3(6):527–35.
doi: 10.1038/s42256-021-00335-w
Quan Z, Guo Y, Lin X, Wang ZJ, Zeng X. Graphcpi: Graph neural representation learning for compound-protein interaction. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). San Diego: IEEE; 2019. p. 717–22.
Zeng X, Xiang H, Yu L, Wang J, Li K, Nussinov R, et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat Mach Intell. 2022;4(11):1004–16.
doi: 10.1038/s42256-022-00557-6
Liu Y, Zhou Z, Cao X, Cao D, Zeng X. Effective drug-target affinity prediction via generative active learning. Inf Sci. 2024;679:121135.
Bonner S, Barrett IP, Ye C, Swiers R, Engkvist O, Bender A, et al. A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Brief Bioinform. 2022;23(6):bbac404.
doi: 10.1093/bib/bbac404 pubmed: 36151740
Ma T, Tao W, Li M, Zhang J, Pan X, Lin J, et al. KGExplainer: towards exploring connected subgraph explanations for knowledge graph completion. arXiv preprint arXiv:240403893. 2024.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. New York: Association for Computing Machinery; 2012.
Huang K, Xiao C, Glass LM, Sun J. MolTrans: molecular interaction transformer for drug-target interaction prediction. Bioinformatics. 2021;37(6):830–6.
doi: 10.1093/bioinformatics/btaa880 pubmed: 33070179
Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, et al. TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–14.
doi: 10.1093/bioinformatics/btaa524 pubmed: 32428219
Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579–605.
Gunthorpe MJ, Large CH, Sankar R. The mechanism of action of retigabine (ezogabine), a first-in-class K+ channel opener for the treatment of epilepsy. Epilepsia. 2012;53(3):412–24.
doi: 10.1111/j.1528-1167.2011.03365.x pubmed: 22220513
Pei Q, Wu L, Zhu J, Xia Y, Xie S, Qin T, et al. Breaking the barriers of data scarcity in drug-target affinity prediction. Brief Bioinform. 2023;24(6):bbad386.
doi: 10.1093/bib/bbad386 pubmed: 37903413
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems. Red Hook: Curran Associates Inc.; 2013.
Cui Y, Wang Y, Sun Z, Liu W, Jiang Y, Han K, et al. Lifelong embedding learning and transfer for growing knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press; 2023;37:4217–24.
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.
doi: 10.1038/s41467-017-00680-8 pubmed: 28924171 pmcid: 5603535
Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, et al. Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci. 2020;11(7):1775–97.
doi: 10.1039/C9SC04336E pubmed: 34123272 pmcid: 8150105
Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–80.
doi: 10.1021/ci010132r pubmed: 12444722
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54.
doi: 10.1021/ci100050t pubmed: 20426451
Landrum G. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum. 2013;8(31.10):5281.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in neural information processing systems. Red Hook: Curran Associates Inc.; 2017.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci. 2021;118(15):e2016239118.
doi: 10.1073/pnas.2016239118 pubmed: 33876751 pmcid: 8053943
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019;1:4171–4186. 
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7.
doi: 10.1126/science.1127647 pubmed: 16873662
Lin Y, Gou Y, Liu Z, Li B, Lv J, Peng X. Completer: incomplete multi-view clustering via contrastive prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville: IEEE; 2021. p. 11174–83.
He K, Chen X, Xie S, Li Y, Dollár P, Girshick R. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans: IEEE; 2022. pp. 16000–9.
Tao W, Liu Y, Lin X, Song B, Zeng X. Prediction of multi-relational drug-gene interaction via Dynamic hyperGraph Contrastive Learning. Brief Bioinform. 2023;24(6):bbad371.
doi: 10.1093/bib/bbad371 pubmed: 37864294
Ma T, Chen Y, Tao W, Zheng D, Lin X, Pang CI, et al. Learning to denoise biomedical knowledge graph for robust molecular interaction prediction. IEEE Trans Knowl Data Eng. 2024;1–13.
Oord Avd, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv preprint arXiv:180703748. 2018.
Liu H, Sun J, Guan J, Zheng J, Zhou S. Improving compound-protein interaction prediction by building up highly credible negative samples. Bioinformatics. 2015;31(12):i221–9.
doi: 10.1093/bioinformatics/btv256 pubmed: 26072486 pmcid: 4765858
Zitnik M, Sosic R, Leskovec J. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection. 2018;5(1). https://snap.stanford.edu/biodata . Accessed 20 Aug 2022.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–82.
doi: 10.1093/nar/gkx1037 pubmed: 29126136
Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018;34(13):i457–66.
doi: 10.1093/bioinformatics/bty294 pubmed: 29949996 pmcid: 6022705
Zhang M, Chen Y. Link prediction based on graph neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Red Hook: Curran Associates Inc.; 2018:5171–81.
Ioannidis V, Song X, Manchanda S, Li M, Pan X, Zheng D, et al. DRKG-Drug Repurposing Knowledge Graph for COVID-19. 2020. https://github.com/gnn4dr/DRKG . Accessed 20 Aug 2022. 
Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, et al. Deep-learning-based drug-target interaction prediction. J Proteome Res. 2017;16(4):1401–9.
doi: 10.1021/acs.jproteome.6b00618 pubmed: 28264154
Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.
doi: 10.4249/scholarpedia.5947
Lee I, Keum J, Nam H. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15(6):e1007129.
doi: 10.1371/journal.pcbi.1007129 pubmed: 31199797 pmcid: 6594651
Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat Mach Intell. 2023;5(2):126–36.
doi: 10.1038/s42256-022-00605-1
Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:160706450. 2016.
Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10). Madison: Omnipress; 2010. pp. 807–14.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019;32.
Kingma DP, Adam BJ. A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). 2015.
Tao W, Lin X, Liu Y, Zeng L, Ma T, Cheng N, et al. Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction. Zenodo. 2024. https://doi.org/10.5281/zenodo.13913963 .
doi: 10.5281/zenodo.13913963

Auteurs

Wen Tao (W)

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.

Xuan Lin (X)

School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China.
Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China.

Yuansheng Liu (Y)

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China. yuanshengliu@hnu.edu.cn.
Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China. yuanshengliu@hnu.edu.cn.

Li Zeng (L)

Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, 201109, Shanghai, China.

Tengfei Ma (T)

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.

Ning Cheng (N)

School of Informatics, Hunan University of Chinese Medicine, Changsha, 410208, Hunan, China.

Jing Jiang (J)

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.

Xiangxiang Zeng (X)

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.

Sisi Yuan (S)

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, 28223, NC, USA. syuan4@charlotte.edu.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Humans Chondrocytes Osteoarthritis Matrix Metalloproteinase 13 Drug Discovery

Classifications MeSH