MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy.
Journal
ACS omega
ISSN: 2470-1343
Titre abrégé: ACS Omega
Pays: United States
ID NLM: 101691658
Informations de publication
Date de publication:
07 Nov 2023
07 Nov 2023
Historique:
received:
16
09
2023
revised:
11
10
2023
accepted:
13
10
2023
medline:
16
11
2023
pubmed:
16
11
2023
entrez:
16
11
2023
Statut:
epublish
Résumé
As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.
Identifiants
pubmed: 37969991
doi: 10.1021/acsomega.3c07086
pmc: PMC10634282
doi:
Types de publication
Journal Article
Langues
eng
Pagination
41930-41942Informations de copyright
© 2023 The Authors. Published by American Chemical Society.
Déclaration de conflit d'intérêts
The authors declare no competing financial interest.
Références
Front Bioinform. 2022 Feb 18;2:834153
pubmed: 36304324
Biomed Res Int. 2014;2014:528650
pubmed: 25147802
Mol Cell Biochem. 1973 Nov 15;2(1):3-14
pubmed: 4587539
Brief Bioinform. 2022 Mar 10;23(2):
pubmed: 35225328
Brief Bioinform. 2023 Mar 19;24(2):
pubmed: 36653898
Opt Lett. 2009 Jul 1;34(13):2033-5
pubmed: 19571990
DNA Repair (Amst). 2004 Nov 2;3(11):1483-92
pubmed: 15380104
Brief Bioinform. 2023 Jan 19;24(1):
pubmed: 36631405
J Theor Biol. 2019 Jan 14;461:92-101
pubmed: 30365945
Nucleic Acids Res. 2004 Dec 01;32(21):6292-303
pubmed: 15576355
Nat Chem Biol. 2010 Mar;6(3):238-243
pubmed: 20139990
Nat Commun. 2019 Nov 13;10(1):5138
pubmed: 31723136
Jundishapur J Microbiol. 2014 Mar;7(3):e9367
pubmed: 25147690
J Proteome Res. 2013 Feb 1;12(2):949-58
pubmed: 23298314
J Biol Chem. 2000 Jul 21;275(29):22238-44
pubmed: 10801868
Nucleic Acids Res. 2017 Jul 3;45(W1):W534-W538
pubmed: 28460012
Tuberculosis (Edinb). 2002;82(2-3):85-90
pubmed: 12356459
Bioinformatics. 2022 Jan 12;38(3):648-654
pubmed: 34643684
J Genet Genomics. 2017 May 20;44(5):243-250
pubmed: 28529077
Nat Methods. 2013 Dec;10(12):1211-2
pubmed: 24097270
Annu Rev Biochem. 1981;50:783-814
pubmed: 6791580
Brief Bioinform. 2022 Mar 10;23(2):
pubmed: 35189635
PLoS One. 2012;7(11):e49108
pubmed: 23173045
Microbiology (Reading). 2018 Apr;164(4):437-439
pubmed: 29465344
Nature. 2007 Apr 26;446(7139):993-5
pubmed: 17460654
Brief Bioinform. 2022 Jan 17;23(1):
pubmed: 34532736
Nat Rev Microbiol. 2010 Mar;8(3):207-17
pubmed: 20157339
Sci Rep. 2014 Jul 21;4:5765
pubmed: 25042424
Bioinformatics. 2018 Dec 1;34(23):3999-4006
pubmed: 29868863
Bioinformatics. 2010 Mar 1;26(5):680-2
pubmed: 20053844
Curr Med Chem. 2022;29(2):235-250
pubmed: 34477504
Brief Bioinform. 2022 Sep 20;23(5):
pubmed: 35514183
Mol Biosyst. 2012 Nov;8(11):2964-73
pubmed: 22936054
Mol Biosyst. 2012 Apr;8(5):1520-7
pubmed: 22402705