COYOTE: Sequence-derived structural descriptors-based computational identification of glycoproteins.
Machine learning
glycoproteins
predictive modeling
protein interactions
sequence information
Journal
Journal of bioinformatics and computational biology
ISSN: 1757-6334
Titre abrégé: J Bioinform Comput Biol
Pays: Singapore
ID NLM: 101187344
Informations de publication
Date de publication:
10 2022
10 2022
Historique:
pubmed:
14
9
2022
medline:
16
11
2022
entrez:
13
9
2022
Statut:
ppublish
Résumé
Glycoproteins play an important and ubiquitous role in many biological processes such as protein folding, cell-to-cell signaling, invading microorganism infection, tumor metastasis, and leukocyte trafficking. The key mechanism of glycoproteins must be revealed to model and refine glycosylated protein recognition, which will eventually assist in the design and discovery of carbohydrate-derived therapeutics. Experimental procedures involving wet-lab experiments to reveal glycoproteins are very time-consuming, laborious, and highly costly. However, costly and tedious experimental procedures can be assisted by ranking the most probable glycoproteins through computational methods with improved accuracy. In this study, we have proposed a novel machine learning-based predictive model for glycoproteins identification. Our proposed model is based on sequence-derived structural descriptors (SDSD) that fill the gap of unavailability of protein 3D structures and lack of accuracy in sequence information alone. Through a series of simulation studies, we have shown that our proposed model gives state-of-the-art generalization performance verified through various machine learning-centric and biologically relevant techniques and metrics. Through data mining in this study, we have also identified the role of descriptors in determining glycoproteins. Python-based standalone code together with a webserver implementation of our proposed model (COYOTE: identifiCation Of glYcoprOteins Through sEquences) is available at the URL: https://sites.google.com/view/wajidarshad/software.
Identifiants
pubmed: 36098715
doi: 10.1142/S0219720022500196
doi:
Substances chimiques
Glycoproteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM