Computational identification of 4-carboxyglutamate sites to supplement physiological studies using deep learning.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
07 01 2022
Historique:
received: 28 09 2021
accepted: 03 12 2021
entrez: 8 1 2022
pubmed: 9 1 2022
medline: 3 3 2022
Statut: epublish

Résumé

In biological systems, Glutamic acid is a crucial amino acid which is used in protein biosynthesis. Carboxylation of glutamic acid is a significant post-translational modification which plays important role in blood coagulation by activating prothrombin to thrombin. Contrariwise, 4-carboxy-glutamate is also found to be involved in diseases including plaque atherosclerosis, osteoporosis, mineralized heart valves, bone resorption and serves as biomarker for onset of these diseases. Owing to the pathophysiological significance of 4-carboxyglutamate, its identification is important to better understand pathophysiological systems. The wet lab identification of prospective 4-carboxyglutamate sites is costly, laborious and time consuming due to inherent difficulties of in-vivo, ex-vivo and in vitro experiments. To supplement these experiments, we proposed, implemented, and evaluated a different approach to develop 4-carboxyglutamate site predictors using pseudo amino acid compositions (PseAAC) and deep neural networks (DNNs). Our approach does not require any feature extraction and employs deep neural networks to learn feature representation of peptide sequences and performing classification thereof. Proposed approach is validated using standard performance evaluation metrics. Among different deep neural networks, convolutional neural network-based predictor achieved best scores on independent dataset with accuracy of 94.7%, AuC score of 0.91 and F1-score of 0.874 which shows the promise of proposed approach. The iCarboxE-Deep server is deployed at https://share.streamlit.io/sheraz-n/carboxyglutamate/app.py .

Identifiants

pubmed: 34996975
doi: 10.1038/s41598-021-03895-4
pii: 10.1038/s41598-021-03895-4
pmc: PMC8741832
doi:

Substances chimiques

Glutamic Acid 3KX376GY7L
Proteins 0
4-carboxyglutamate 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

128

Informations de copyright

© 2022. The Author(s).

Références

Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
J Biol Chem. 1980 Dec 25;255(24):11656-9
pubmed: 6449511
BMC Genomics. 2020 Jan 2;21(1):6
pubmed: 31898477
Bioinformatics. 2006 Jun 15;22(12):1536-7
pubmed: 16632492
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33099604
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Anal Biochem. 2021 Feb 15;615:114069
pubmed: 33340540
Bioinformatics. 2000 May;16(5):412-24
pubmed: 10871264
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Atherosclerosis. 1986 Feb;59(2):155-60
pubmed: 3083831
Protein Eng. 2001 Feb;14(2):75-9
pubmed: 11297664
Annu Rev Biochem. 1985;54:459-77
pubmed: 3896125
Pac Symp Biocomput. 2012;:94-103
pubmed: 22174266
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51
pubmed: 1180967
Genomics. 2020 Jan;112(1):859-866
pubmed: 31175975
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):596-610
pubmed: 31144645
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
Anal Biochem. 2019 Mar 1;568:14-23
pubmed: 30593778
Sci Rep. 2016 Oct 04;6:34817
pubmed: 27698459
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov 26;PP:
pubmed: 33242308
Bioinformatics. 2021 Apr 19;37(2):171-177
pubmed: 32766811
BMC Bioinformatics. 2011;12 Suppl 13:S10
pubmed: 22372765
J Biol Chem. 1995 Dec 22;270(51):30491-8
pubmed: 8530480
Sci Rep. 2020 Oct 9;10(1):16913
pubmed: 33037248
J Biol Chem. 1980 Jul 25;255(14):6579-83
pubmed: 6967067
Arteriosclerosis. 1990 Nov-Dec;10(6):991-5
pubmed: 2123092
Artif Intell Med. 2017 Nov;83:75-81
pubmed: 28283358
J Theor Biol. 2019 May 7;468:1-11
pubmed: 30768975

Auteurs

Sheraz Naseer (S)

Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan.

Rao Faizan Ali (RF)

Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan. faizan.ali@umt.edu.pk.
Computer and Information Sciences Department, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Malaysia. faizan.ali@umt.edu.pk.

Suliman Mohamed Fati (SM)

College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia.

Amgad Muneer (A)

Computer and Information Sciences Department, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Malaysia.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH