DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
27 Nov 2023
27 Nov 2023
Historique:
medline:
28
11
2023
pubmed:
6
11
2023
entrez:
6
11
2023
Statut:
ppublish
Résumé
Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for
Identifiants
pubmed: 37931253
doi: 10.1021/acs.jcim.3c00996
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM