Protein features fusion using attributed network embedding for predicting protein-protein interaction.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
13 May 2024
Historique:
received: 10 01 2024
accepted: 29 04 2024
medline: 14 5 2024
pubmed: 14 5 2024
entrez: 13 5 2024
Statut: epublish

Résumé

Protein-protein interactions (PPIs) hold significant importance in biology, with precise PPI prediction as a pivotal factor in comprehending cellular processes and facilitating drug design. However, experimental determination of PPIs is laborious, time-consuming, and often constrained by technical limitations. We introduce a new node representation method based on initial information fusion, called FFANE, which amalgamates PPI networks and protein sequence data to enhance the precision of PPIs' prediction. A Gaussian kernel similarity matrix is initially established by leveraging protein structural resemblances. Concurrently, protein sequence similarities are gauged using the Levenshtein distance, enabling the capture of diverse protein attributes. Subsequently, to construct an initial information matrix, these two feature matrices are merged by employing weighted fusion to achieve an organic amalgamation of structural and sequence details. To gain a more profound understanding of the amalgamated features, a Stacked Autoencoder (SAE) is employed for encoding learning, thereby yielding more representative feature representations. Ultimately, classification models are trained to predict PPIs by using the well-learned fusion feature. When employing 5-fold cross-validation experiments on SVM, our proposed method achieved average accuracies of 94.28%, 97.69%, and 84.05% in terms of Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori datasets, respectively. Experimental findings across various authentic datasets validate the efficacy and superiority of this fusion feature representation approach, underscoring its potential value in bioinformatics.

Sections du résumé

BACKGROUND BACKGROUND
Protein-protein interactions (PPIs) hold significant importance in biology, with precise PPI prediction as a pivotal factor in comprehending cellular processes and facilitating drug design. However, experimental determination of PPIs is laborious, time-consuming, and often constrained by technical limitations.
METHODS METHODS
We introduce a new node representation method based on initial information fusion, called FFANE, which amalgamates PPI networks and protein sequence data to enhance the precision of PPIs' prediction. A Gaussian kernel similarity matrix is initially established by leveraging protein structural resemblances. Concurrently, protein sequence similarities are gauged using the Levenshtein distance, enabling the capture of diverse protein attributes. Subsequently, to construct an initial information matrix, these two feature matrices are merged by employing weighted fusion to achieve an organic amalgamation of structural and sequence details. To gain a more profound understanding of the amalgamated features, a Stacked Autoencoder (SAE) is employed for encoding learning, thereby yielding more representative feature representations. Ultimately, classification models are trained to predict PPIs by using the well-learned fusion feature.
RESULTS RESULTS
When employing 5-fold cross-validation experiments on SVM, our proposed method achieved average accuracies of 94.28%, 97.69%, and 84.05% in terms of Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori datasets, respectively.
CONCLUSION CONCLUSIONS
Experimental findings across various authentic datasets validate the efficacy and superiority of this fusion feature representation approach, underscoring its potential value in bioinformatics.

Identifiants

pubmed: 38741045
doi: 10.1186/s12864-024-10361-8
pii: 10.1186/s12864-024-10361-8
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

466

Subventions

Organisme : Ministry of Higher Education, Malaysia
ID : FRGS/1/2022/ICT02/UKM/02/7

Informations de copyright

© 2024. The Author(s).

Références

Khatun MS, Shoombuatong W, Hasan MM, Kurata H. Evolution of sequence-based bioinformatics tools for protein-protein interaction prediction. Curr Genomics. 2020;21(6):454–63. https://doi.org/10.2174/1389202921999200625103936 .
doi: 10.2174/1389202921999200625103936 pubmed: 33093807 pmcid: 7536797
Silverbush D, Sharan R. A systematic approach to orient the human protein–protein interaction network. Nat Commun. 2019;10(1):3015.
doi: 10.1038/s41467-019-10887-6 pubmed: 31289271 pmcid: 6617457
Kamal NAM, Bakar AA, Zainudin S. Optimization of Discrete Wavelet transform feature representation and hierarchical classification of G-Protein coupled receptor using firefly algorithm and particle swarm optimization. Appl Sci. 2022;12(23):12011.
doi: 10.3390/app122312011
Alonso-Lopez D, Campos-Laborie FJ, Gutierrez MA, Lambourne L, Calderwood MA, Vidal M, et al. APID database: redefining protein-protein interaction experimental evidences and binary interactomes. Database (Oxford). 2019;2019. https://doi.org/10.1093/database/baz005 .
Sadat-Ebrahimi SR, Rezabakhsh A, Aslanabadi N, Asadi M, Zafari V, Shanebandi D, et al. Novel diagnostic potential of miR-1 in patients with acute heart failure. PLoS ONE. 2022;17(9):e0275019. https://doi.org/10.1371/journal.pone.0275019 .
doi: 10.1371/journal.pone.0275019 pubmed: 36149935 pmcid: 9506628
Aldulaimi MH, Zainudin S, Bakar AA. An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm. Int J Bioinform Res Appl. 2018;14(4):376–400.
doi: 10.1504/IJBRA.2018.094965
Zhang S, Hu Z-w, Mao C-y, Shi C-h. Xu Y-m. CHIP as a therapeutic target for neurological diseases. Cell Death Dis. 2020;11(9):1–12.
doi: 10.1038/s41419-019-2182-0 pubmed: 31911576 pmcid: 6946659
Paiano A, Margiotta A, De Luca M, Bucci C. Yeast two-hybrid assay to identify interacting proteins. Curr Protoc Protein Sci. 2019;95(1):e70. https://doi.org/10.1002/cpps.70 .
doi: 10.1002/cpps.70 pubmed: 30133175
Chavez JD, Bruce JE. Chemical cross-linking with mass spectrometry: a tool for systems structural biology. Curr Opin Chem Biol. 2019;48:8–18. https://doi.org/10.1016/j.cbpa.2018.08.006 .
doi: 10.1016/j.cbpa.2018.08.006 pubmed: 30172868
Huggins DJ, Biggin PC, Dämgen MA, Essex JW, Harris SA, Henchman RH, et al. Biomolecular simulations: from dynamics and mechanisms to computational assays of biological activity. Wiley Interdisciplinary Rev Comput Mol Sci. 2019;9(3):e1393.
Jaremko MJ, Davis TD, Corpuz JC, Burkart MD. Type II non-ribosomal peptide synthetase proteins: structure, mechanism, and protein–protein interactions. Nat Prod Rep. 2020;37(3):355–79.
doi: 10.1039/C9NP00047J pubmed: 31593192 pmcid: 7101270
Tsang TF, Qiu Y, Lin L, Ye J, Ma C, Yang X. Simple method for studying in vitro protein–protein interactions based on protein complementation and its application in drug screening targeting bacterial transcription. ACS Infect Dis. 2019;5(4):521–7.
doi: 10.1021/acsinfecdis.9b00020 pubmed: 30834747
Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023;379(6639):1358–63.
doi: 10.1126/science.adf2465 pubmed: 36996195
Titeca K, Lemmens I, Tavernier J, Eyckerman S. Discovering cellular protein-protein interactions: technological strategies and opportunities. Mass Spectrom Rev. 2019;38(1):79–111.
doi: 10.1002/mas.21574 pubmed: 29957823
Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, et al. A deep-learning framework for multi-level peptide-protein interaction prediction. Nat Commun. 2021;12(1):5465. https://doi.org/10.1038/s41467-021-25772-4 .
doi: 10.1038/s41467-021-25772-4 pubmed: 34526500 pmcid: 8443569
Hu L, Wang X, Huang YA, Hu P, You ZH. A survey on computational models for predicting protein-protein interactions. Brief Bioinform. 2021;22(5). https://doi.org/10.1093/bib/bbab036 .
Kumar A, Mishra S, Singh SS, Singh K, Biswas B. Link prediction in complex networks based on significance of higher-order path index (SHOPI). Physica A. 2020;545. https://doi.org/10.1016/j.physa.2019.123790 .
Zhang L, Yu G, Xia D, Wang J. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing. 2019;324:10–9.
doi: 10.1016/j.neucom.2018.02.097
Wang Y, You ZH, Yang S, Li X, Jiang TH, Zhou X. A high efficient biological language model for predicting protein(-)protein interactions. Cells. 2019;8(2):122. https://doi.org/10.3390/cells8020122 .
doi: 10.3390/cells8020122 pubmed: 30717470 pmcid: 6406841
Kovacs IA, Luck K, Spirohn K, Wang Y, Pollis C, Schlabach S, et al. Network-based prediction of protein interactions. Nat Commun. 2019;10(1):1240. https://doi.org/10.1038/s41467-019-09177-y .
doi: 10.1038/s41467-019-09177-y pubmed: 30886144 pmcid: 6423278
Jha K, Karmakar S, Saha S. Graph-BERT and language model-based framework for protein–protein interaction identification. Sci Rep. 2023;13(1):5663.
doi: 10.1038/s41598-023-31612-w pubmed: 37024543 pmcid: 10079975
Song B, Luo X, Luo X, Liu Y, Niu Z, Zeng X. Learning spatial structures of proteins improves protein–protein interaction prediction. Brief Bioinform. 2022;23(2):bbab558.
doi: 10.1093/bib/bbab558 pubmed: 35018418
Luo X, Wang L, Hu P, Hu L. Predicting protein-protein interactions using sequence and network information via variational graph autoencoder. IEEE/ACM Transactions on Computational Biology and Bioinformatics; 2023.
Halsana AA, Chakroborty T, Halder AK, Basu S. DensePPI: a novel image-based deep learning method for prediction of protein-protein interactions. IEEE Trans Nanobiosci. 2023.
Yu D, Chojnowski G, Rosenthal M, Kosinski J. AlphaPulldown—a python package for protein–protein interaction screens using AlphaFold-Multimer. Bioinformatics. 2023;39(1):btac749.
doi: 10.1093/bioinformatics/btac749 pubmed: 36413069
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30.
doi: 10.1126/science.ade2574 pubmed: 36927031
Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022;13(1):1265.
doi: 10.1038/s41467-022-28865-w pubmed: 35273146 pmcid: 8913741
Li J, Shi X, You ZH, Yi HC, Chen Z, Lin Q, et al. Using weighted extreme learning machine combined with scale-invariant feature transform to predict protein-protein interactions from protein evolutionary information. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(5):1546–54. https://doi.org/10.1109/TCBB.2020.2965919 .
doi: 10.1109/TCBB.2020.2965919 pubmed: 31940546
Li Y, Wang Z, Li L-P, You Z-H, Huang W-Z, Zhan X-K, et al. Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information. Sci Rep. 2021;11(1):16910.
doi: 10.1038/s41598-021-96265-z pubmed: 34413375 pmcid: 8376940
Mahapatra S, Sahu SS. Improved prediction of protein–protein interaction using a hybrid of functional-link siamese neural network and gradient boosting machines. Brief Bioinform. 2021;22(6):bbab255.
doi: 10.1093/bib/bbab255 pubmed: 34245238
Wang L, Hu L. A deep learning algorithm for predicting protein-protein interactions with nonnegative latent factorization. In: 2021 International Conference on Cyber-Physical Social Intelligence (ICCSI). IEEE; 2021: 1–6.
Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, et al. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med. 2020;123:103899. https://doi.org/10.1016/j.compbiomed.2020.103899 .
doi: 10.1016/j.compbiomed.2020.103899 pubmed: 32768046
An J-Y, Zhou Y, Zhao Y-J, Yan Z-J. An efficient feature extraction technique based on local coding PSSM and multifeatures fusion for predicting protein-protein interactions. Evolutionary Bioinf. 2019;15:1176934319879920.
doi: 10.1177/1176934319879920
Sharma A, Singh B. AE-LGBM: sequence-based novel approach to detect interacting protein pairs via ensemble of autoencoder and LightGBM. Comput Biol Med. 2020;125:103964.
doi: 10.1016/j.compbiomed.2020.103964 pubmed: 32911276
Guo Y, Yu L, Wen Z, Li M. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 2008;36(9):3025–30.
doi: 10.1093/nar/gkn159 pubmed: 18390576 pmcid: 2396404
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The database of interacting proteins: 2004 update. Nucleic Acids Res. 2004;32(suppl1):D449–51.
doi: 10.1093/nar/gkh086 pubmed: 14681454 pmcid: 308820
You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing. 2014;145:37–43.
doi: 10.1016/j.neucom.2014.05.072
Martin S, Roe D, Faulon J-L. Predicting protein–protein interactions using signature products. Bioinformatics. 2005;21(2):218–26.
doi: 10.1093/bioinformatics/bth483 pubmed: 15319262
Xu J-H. Identifying G-protein coupled receptors using weighted levenshtein distance and nearest neighbor method. Genom Proteom Bioinform. 2005;3(4):252–7.
doi: 10.1016/S1672-0229(05)03036-6
Zhao C, Sahni S. String correction using the Damerau-Levenshtein distance. BMC Bioinformatics. 2019;20(11):1–28.
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. https://doi.org/10.1093/bioinformatics/btp163 .
doi: 10.1093/bioinformatics/btp163 pubmed: 19304878 pmcid: 2682512
Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: Proceedings of the eleventh ACM international conference on web search and data mining. 2018: 459 – 67.
Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
doi: 10.1007/BF02289026
Hong R, He Y, Wu L, Ge Y, Wu X. Deep attributed network embedding by preserving structure and attribute information. IEEE Trans Syst Man Cybernetics: Syst. 2019;51(3):1434–45.
doi: 10.1109/TSMC.2019.2897152
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11(12).

Auteurs

Mei-Yuan Cao (MY)

Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia. p116930@siswa.ukm.edu.my.

Suhaila Zainudin (S)

Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia.

Kauthar Mohd Daud (KM)

Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH