GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.

Golgi Apparatus / metabolism Computational Biology / methods Humans Deep Learning

AlphaFold2 Deep representation learning Graph neural network Multi-scale features Sub-Golgi proteins TextCNN

Journal

BMC genomics

ISSN: 1471-2164

Titre abrégé: BMC Genomics

Pays: England

ID NLM: 100965258

Informations de publication

Date de publication:
30 Oct 2024

Historique:

received: 03 03 2024

accepted: 24 10 2024

medline: 31 10 2024

pubmed: 31 10 2024

entrez: 31 10 2024

Statut: epublish

Résumé

The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .

Identifiants

DOI: 10.1186/s12864-024-10954-3 PMID: 39478465

pubmed: 39478465

doi: 10.1186/s12864-024-10954-3

pii: 10.1186/s12864-024-10954-3

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1019

Subventions

Organisme : Natural Science Foundation of Shandong Province

ID : ZR2021MF036

Organisme : National Natural Science Foundation of China

ID : 31872415

Informations de copyright

Références

Hoyer S. Is sporadic Alzheimer disease the brain type of non-insulin dependent diabetes mellitus? A challenging hypothesis. J Neural Transm. 1998;105(4):415–22.

pubmed: 9720971 doi: 10.1007/s007020050067

Rose DR. Structure, mechanism and inhibition of Golgi α-mannosidase II. Curr Opin Struct Biol. 2012;22(5):558–62.

pubmed: 22819743 doi: 10.1016/j.sbi.2012.06.005

Gonatas N, Gonatas JO, Stieber A. The involvement of the Golgi apparatus in the pathogenesis of amyotrophic lateral sclerosis, Alzheimer’s disease, and ricin intoxication. Histochem Cell Biol. 1998;109(5):591–600.

pubmed: 9681637 doi: 10.1007/s004180050257

Yang W, Zhu X-J, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform. 2019;14(3):234–40.

doi: 10.2174/1574893613666181113131415

Wang Z, Ding H, Zou Q. Identifying cell types to interpret scRNA-seq data: how, why and more possibilities. Brief Funct Genomics. 2020;19(4):286–91.

pubmed: 32232401 doi: 10.1093/bfgp/elaa003

Yuan L, Guo F, Wang L, Zou Q. Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief Funct Genomics. 2019;18(6):412–8.

pubmed: 31204784 doi: 10.1093/bfgp/elz010

Qiu W, Li S, Cui X, Yu Z, Wang M, Du J, et al. Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J Theor Biol. 2018;450:86–103.

pubmed: 29678694 doi: 10.1016/j.jtbi.2018.04.026

Cui X, Yu Z, Yu B, Wang M, Tian B, Ma Q, UbiSitePred:. A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou’s pseudo components. Chemometr Intell Lab Syst. 2019;184:28–43.

doi: 10.1016/j.chemolab.2018.11.012

Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B. Predicting protein–protein interactions by fusing various Chou’s pseudo components and using wavelet denoising approach. J Theor Biol. 2019;462:329–46.

pubmed: 30452960 doi: 10.1016/j.jtbi.2018.11.011

Yu B, Li S, Qiu W, Wang M, Du J, Zhang Y, et al. Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genomics. 2018;19(1):1–17.

doi: 10.1186/s12864-018-4849-9

Cheng X, Xiao X, Chou K-C. pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC. J Theor Biol. 2018;458:92–102.

pubmed: 30201434 doi: 10.1016/j.jtbi.2018.09.005

Ahmad J, Javed F, Hayat M. Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods. Artif Intell Med. 2017;78:14–22.

pubmed: 28764869 doi: 10.1016/j.artmed.2017.05.001

Ding H, Liu L, Guo F-B, Huang J, Lin H. Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett. 2011;18(1):58–63.

pubmed: 20955168 doi: 10.2174/092986611794328708

Ding H, Guo S-H, Deng E-Z, Yuan L-F, Guo F-B, Huang J, et al. Prediction of Golgi-resident protein types by using feature selection technique. Chemometr Intell Lab Syst. 2013;124:9–13.

doi: 10.1016/j.chemolab.2013.03.005

Jiao Y-S, Du P-F. Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties. J Theor Biol. 2016;391:35–42.

pubmed: 26702543 doi: 10.1016/j.jtbi.2015.11.009

Jiao Y-S, Du P-F. Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection. J Theor Biol. 2016;402:38–44.

pubmed: 27155042 doi: 10.1016/j.jtbi.2016.04.032

Yang R, Zhang C, Gao R, Zhang L. A novel feature extraction method with feature selection to identify Golgi-resident protein types from imbalanced data. Int J Mol Sci. 2016;17(2):218.

pubmed: 26861308 pmcid: 4783950 doi: 10.3390/ijms17020218

Rahman MS, Rahman MK, Kaykobad M, Rahman MS, isGPT. An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection. Artif Intell Med. 2018;84:90–100.

pubmed: 29183738 doi: 10.1016/j.artmed.2017.11.003

Ahmad J, Hayat M, MFSC. Multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou’s PseAAC components. J Theor Biol. 2019;463:99–109.

pubmed: 30562500 doi: 10.1016/j.jtbi.2018.12.017

Zhou H, Chen C, Wang M, Ma Q, Yu B. Predicting golgi-resident protein types using conditional covariance minimization with XGBoost based on multiple features fusion. Ieee Access. 2019;7:144154–64.

doi: 10.1109/ACCESS.2019.2938081

Lv Z, Jin S, Ding H, Zou Q. A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front Bioeng Biotechnol. 2019;7:215.

pubmed: 31552241 pmcid: 6737778 doi: 10.3389/fbioe.2019.00215

Lv Z, Wang P, Zou Q, Jiang Q. Identification of sub-Golgi protein localization by use of deep representation learning features. Bioinformatics. 2020;36(24):5600–9.

pmcid: 8023683 doi: 10.1093/bioinformatics/btaa1074

Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019;47(D1):D351–60.

pubmed: 30398656 doi: 10.1093/nar/gky1100

Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017;45(D1):D289–95.

pubmed: 27899584 doi: 10.1093/nar/gkw1098

Lai B, Xu J. Accurate protein function prediction via graph attention networks with predicted structure information. Brief Bioinform. 2022;23(1):bbab502.

pubmed: 34882195 doi: 10.1093/bib/bbab502

Gligorijević V, Renfrew PD, Kosciolek T, Leman JK, Berenberg D, Vatanen T, et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12(1):3168.

pubmed: 34039967 pmcid: 8155034 doi: 10.1038/s41467-021-23303-9

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

pubmed: 34265844 doi: 10.1038/s41586-021-03819-2

Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins Struct Funct Bioinform. 2014;82:1–6.

doi: 10.1002/prot.24452

Lv Z, Cui F, Zou Q, Zhang L, Xu L. Anticancer peptides prediction with deep representation learning features. Brief Bioinform. 2021;22(5):bbab008.

pubmed: 33529337 doi: 10.1093/bib/bbab008

Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. GigaScience. 2022;11.

Cui F, Zhang Z, Zou Q. Sequence representation approaches for sequence-based protein prediction tasks that use deep learning. Brief Funct Genomics. 2021;20(1):61–73.

pubmed: 33527980 doi: 10.1093/bfgp/elaa030

Long H, Sun Z, Li M, Fu HY, Lin MC. Predicting protein phosphorylation sites based on deep learning. Curr Bioinform. 2020;15(4):300–8.

doi: 10.2174/1574893614666190902154332

Zhang Y, Yan J, Chen S, Gong M, Gao D, Zhu M, et al. Review of the applications of deep learning in bioinformatics. Curr Bioinform. 2020;15(8):898–911.

doi: 10.2174/1574893615999200711165743

Heinzinger M, Elnaggar A, Wang Y, et al. Modeling aspects of the language of life through transferlearning protein sequences[J]. BMC Bioinformatics. 2019;20(1):1–17.

doi: 10.1186/s12859-019-3220-8

Boukkouri HE, Ferret O, Lavergne T et al. CharacterBERT: Reconciling ELMo and BERT for word-level open-vocabulary representations from characters[J]. arxiv preprint arxiv:2010.10392, 2020.

Wang X, Sheng Y, Deng H, et al. CharCNN-SVM for Chinese text datasets sentiment classification with data augmentation[J]. Int J Innovative Comput Inform Control. 2019;15(1):227–46.

Xu G, Meng Y, Qiu X, et al. Sentiment analysis of comment texts based on BiLSTM[J]. Ieee Access. 2019;7:51522–32.

doi: 10.1109/ACCESS.2019.2909919

Guo B, Zhang C, Liu J, et al. Improving text classification with weighted word embeddings via a multi-channel TextCNN model[J]. Neurocomputing. 2019;363:366–74.

doi: 10.1016/j.neucom.2019.07.052

Zheng S, Yan X, Yang Y, Xu J. Identifying structure–property relationships through SMILES syntax analysis with self-attention mechanism. J Chem Inf Model. 2019;59(2):914–23.

pubmed: 30669836 doi: 10.1021/acs.jcim.8b00803

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

Zeng X, Lin W, Guo M, Zou Q. A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput Biol. 2017;13(6):e1005420.

pubmed: 28594838 pmcid: 5466358 doi: 10.1371/journal.pcbi.1005420

Wei L, Xing P, Zeng J, Chen J, Su R, Guo F. Improved prediction of protein–protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med. 2017;83:67–74.

pubmed: 28320624 doi: 10.1016/j.artmed.2017.03.001

Wei L, Xing P, Su R, Shi G, Ma ZS, Zou Q. CPPred-RF: a sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J Proteome Res. 2017;16(5):2044–53.

pubmed: 28436664 doi: 10.1021/acs.jproteome.7b00019

Hu Y, Zhao T, Zhang N, Zang T, Zhang J, Cheng L. Identifying diseases-related metabolites using random walk. BMC Bioinformatics. 2018;19(5):37–46.

Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, et al. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics. 2019;35(17):2957–65.

pubmed: 30649179 pmcid: 6736106 doi: 10.1093/bioinformatics/btz016

Song T, Rodríguez-Patón A, Zheng P, Zeng X. Spiking neural P systems with colored spikes. IEEE Trans Cogn Dev Syst. 2017;10(4):1106–15.

doi: 10.1109/TCDS.2017.2785332

Jiao S, Zou Q. Identification of plant vacuole proteins by exploiting deep representation learning features. Comput Struct Biotechnol J. 2022;20:2921–7.

pubmed: 35765653 pmcid: 9207291 doi: 10.1016/j.csbj.2022.06.002

Anteghini M, Martins dos Santos V, Saccenti E. In-Pero: Exploiting deep learning embeddings of protein sequences to predict the localisation of peroxisomal proteins. Int J Mol Sci. 2021;22(12):6409.

pubmed: 34203866 pmcid: 8232616 doi: 10.3390/ijms22126409

Meier J, Rao R, Verkuil R, et al. Language models enable zero-shot prediction of the effects of mutations on protein function[J]. Adv Neural Inf Process Syst. 2021;34:29287–303.

Zhang Y, Zhu G, Li K, et al. HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction[J]. Brief Bioinform. 2022;23(5):bbac173.

pubmed: 35514183 pmcid: 9487590 doi: 10.1093/bib/bbac173

Rao R, Bhattacharya N, Thomas N et al. Evaluating protein transfer learning with TAPE[J]. Adv Neural Inf Process Syst. 2019;32:9689–9701.

Lin Z, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science. 2023;379(6637):1123–30.

pubmed: 36927031 doi: 10.1126/science.ade2574

Wang L, Zhang H, Xu W et al. Deciphering the protein landscape with ProtFlash, a lightweight language model[J]. Cell Rep Phys Sci, 2023, 4(10).

GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Jianan Sui (J)

Jiazi Chen (J)

Yuehui Chen (Y)

Naoki Iwamori (N)

Jin Sun (J)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH