Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations.


Journal

Analytical biochemistry
ISSN: 1096-0309
Titre abrégé: Anal Biochem
Pays: United States
ID NLM: 0370535

Informations de publication

Date de publication:
15 02 2021
Historique:
received: 13 04 2020
revised: 15 11 2020
accepted: 14 12 2020
pubmed: 20 12 2020
medline: 23 6 2021
entrez: 19 12 2020
Statut: ppublish

Résumé

Deep representations can be used to replace human-engineered representations, as such features are constrained by certain limitations. For the prediction of protein post-translation modifications (PTMs) sites, research community uses different feature extraction techniques applied on Pseudo amino acid compositions (PseAAC). Serine phosphorylation is one of the most important PTM as it is the most occurring, and is important for various biological functions. Creating efficient representations from large protein sequences, to predict PTM sites, is a time and resource intensive task. In this study we propose, implement and evaluate use of Deep learning to learn effective protein data representations from PseAAC to develop data driven PTM detection systems and compare the same with two human representations.. The comparisons are performed by training an xgboost based classifier using each representation. The best scores were achieved by RNN-LSTM based deep representation and CNN based representation with an accuracy score of 81.1% and 78.3% respectively. Human engineered representations scored 77.3% and 74.9% respectively. Based on these results, it is concluded that the deep features are promising feature engineering replacement to identify PhosS sites in a very efficient and accurate manner which can help scientists understand the mechanism of this modification in proteins.

Identifiants

pubmed: 33340540
pii: S0003-2697(20)30601-1
doi: 10.1016/j.ab.2020.114069
pii:
doi:

Substances chimiques

Amino Acids 0
Proteins 0
Serine 452VLY9402

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

114069

Informations de copyright

Copyright © 2020 Elsevier Inc. All rights reserved.

Auteurs

Sheraz Naseer (S)

Department of Computer Science, University of Management and Technology, Lahore, Pakistan. Electronic address: sheraz.naseer@umt.edu.pk.

Waqar Hussain (W)

National Center of Artificial Intelligence, Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan; Center for Professional & Applied Studies, Lahore, Pakistan.

Yaser Daanial Khan (YD)

Department of Computer Science, University of Management and Technology, Lahore, Pakistan.

Nouman Rasool (N)

Center for Professional & Applied Studies, Lahore, Pakistan.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH