KiNext: a portable and scalable workflow for the identification and classification of protein kinases.
Genome annotation
Kinase
Workflow
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
25 Oct 2024
25 Oct 2024
Historique:
received:
11
02
2024
accepted:
07
10
2024
medline:
26
10
2024
pubmed:
26
10
2024
entrez:
25
10
2024
Statut:
epublish
Résumé
Protein kinases are a diverse superfamily of proteins common to organisms across the tree of life that are typically involved in signal transduction, allowing organisms to sense and respond to biotic or abiotic environmental factors. They have important roles in organismal physiology, including development, reproduction, acclimation to environmental stress, while their dysregulation can lead to disease, including several forms of cancer. Identifying the complement of protein kinases (the kinome) of any organism is useful for understanding its physiological capabilities, limitations and adaptations to environmental stress. The increasing availability of genomes makes it now possible to examine and compare the kinomes across a broad diversity of organisms. Here we present a pipeline respecting the FAIR principles (findable, accessible, interoperable and reusable) that facilitates the search and identification of protein kinases from a predicted proteome, and classifies them according to group of serine/threonine/tyrosine protein kinases present in eukaryotes. KiNext is a Nextflow pipeline that regroups a number of existing bioinformatic tools to search for and classify the protein kinases of an organism in a reproducible manner, starting from a set of amino acid sequences. Conventional eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs) are identified by using Hidden Markov Models (HMMs) generated from the catalytic domains of kinases. Furthermore, KiNext categorizes ePKs into the eight kinase groups by employing dedicated Hidden Markov Models (HMMs) tailored for each group. The performance of the KiNext pipeline was validated against previously identified kinomes obtained with other tools that were already published for two marine species, the Pacific oyster Crassostrea gigas and the unicellular green alga Ostreoccocus tauri. KiNext outperformed previous results by finding previously unidentified kinases and by attributing a large proportion of previously unclassified kinases to a group in both species. These results demonstrate improvements in kinase identification and classification, all while providing traceability and reproducibility of results in a FAIR pipeline. The default HMM models provided with KiNext are most suitable for eukaryotes, but the pipeline can be easily modified to include HMM models for other taxa of interest. The KiNext pipeline enables efficient and reproducible identification of kinomes based on predicted amino acid sequences (i.e. proteomes). KiNext was designed to be easy to use, automated, portable and scalable.
Sections du résumé
BACKGROUND
BACKGROUND
Protein kinases are a diverse superfamily of proteins common to organisms across the tree of life that are typically involved in signal transduction, allowing organisms to sense and respond to biotic or abiotic environmental factors. They have important roles in organismal physiology, including development, reproduction, acclimation to environmental stress, while their dysregulation can lead to disease, including several forms of cancer. Identifying the complement of protein kinases (the kinome) of any organism is useful for understanding its physiological capabilities, limitations and adaptations to environmental stress. The increasing availability of genomes makes it now possible to examine and compare the kinomes across a broad diversity of organisms. Here we present a pipeline respecting the FAIR principles (findable, accessible, interoperable and reusable) that facilitates the search and identification of protein kinases from a predicted proteome, and classifies them according to group of serine/threonine/tyrosine protein kinases present in eukaryotes.
RESULTS
RESULTS
KiNext is a Nextflow pipeline that regroups a number of existing bioinformatic tools to search for and classify the protein kinases of an organism in a reproducible manner, starting from a set of amino acid sequences. Conventional eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs) are identified by using Hidden Markov Models (HMMs) generated from the catalytic domains of kinases. Furthermore, KiNext categorizes ePKs into the eight kinase groups by employing dedicated Hidden Markov Models (HMMs) tailored for each group. The performance of the KiNext pipeline was validated against previously identified kinomes obtained with other tools that were already published for two marine species, the Pacific oyster Crassostrea gigas and the unicellular green alga Ostreoccocus tauri. KiNext outperformed previous results by finding previously unidentified kinases and by attributing a large proportion of previously unclassified kinases to a group in both species. These results demonstrate improvements in kinase identification and classification, all while providing traceability and reproducibility of results in a FAIR pipeline. The default HMM models provided with KiNext are most suitable for eukaryotes, but the pipeline can be easily modified to include HMM models for other taxa of interest.
CONCLUSION
CONCLUSIONS
The KiNext pipeline enables efficient and reproducible identification of kinomes based on predicted amino acid sequences (i.e. proteomes). KiNext was designed to be easy to use, automated, portable and scalable.
Identifiants
pubmed: 39455913
doi: 10.1186/s12859-024-05953-w
pii: 10.1186/s12859-024-05953-w
doi:
Substances chimiques
Protein Kinases
EC 2.7.-
Proteome
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
338Informations de copyright
© 2024. The Author(s).
Références
Manning G, Plowman GD, Hunter T, Sudarsanam S. Evolution of protein kinase signaling from yeast to man. Trends Biochem Sci. 2002;27:514–20.
doi: 10.1016/S0968-0004(02)02179-5
pubmed: 12368087
Tagliabracci VS, Pinna LA, Dixon JE. Secreted protein kinases. Trends Biochem Sci. 2013;38:121–30.
doi: 10.1016/j.tibs.2012.11.008
pubmed: 23276407
Hanks SK, Quinn AM, Hunter T. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988;241:42–52.
doi: 10.1126/science.3291115
pubmed: 3291115
Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T. The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc Natl Acad Sci. 1999;96:13603–10.
doi: 10.1073/pnas.96.24.13603
pubmed: 10570119
pmcid: 24111
Krebs EG, Beavo JA. Phosphorylation-dephosphorylation of enzymes. Annu Rev Biochem. 1979;48:923–59.
doi: 10.1146/annurev.bi.48.070179.004423
pubmed: 38740
Hanks SK. Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 2003;4:111.
doi: 10.1186/gb-2003-4-5-111
pubmed: 12734000
pmcid: 156577
Leonard CJ, Aravind L, Koonin EV. Novel families of putative protein kinases in bacteria and archaea: evolution of the “eukaryotic” protein kinase superfamily. Genome Res. 1998;8:1038–47.
doi: 10.1101/gr.8.10.1038
pubmed: 9799791
Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like superfamily. PLoS Comput Biol. 2005;1: e49.
doi: 10.1371/journal.pcbi.0010049
pubmed: 16244704
pmcid: 1261164
Walker EH, Perisic O, Ried C, Stephens L, Williams RL. Structural insights into phosphoinositide 3-kinase catalysis and signalling. Nature. 1999;402:313–20.
doi: 10.1038/46319
pubmed: 10580505
Yamaguchi H, Matsushita M, Nairn AC, Kuriyan J. Crystal structure of the atypical protein kinase domain of a trp channel with phosphotransferase activity. Mol Cell. 2001;7:1047–57.
doi: 10.1016/S1097-2765(01)00256-8
pubmed: 11389851
Goldberg JM, Griggs AD, Smith JL, Haas BJ, Wortman JR, Zeng Q. Kinannote, a computer program to identify and classify members of the eukaryotic protein kinase superfamily. Bioinformatics. 2013;29:2387–94.
doi: 10.1093/bioinformatics/btt419
pubmed: 23904509
pmcid: 3777111
Stroehlein AJ, Young ND, Gasser RB. Improved strategy for the curation and classification of kinases, with broad applicability to other eukaryotic protein groups. Sci Rep. 2018;8:6808.
doi: 10.1038/s41598-018-25020-8
pubmed: 29717207
pmcid: 5931623
Adeyelu T, Bordin N, Waman VP, Sadlej M, Sillitoe I, Moya-Garcia AA, et al. KinFams: de-novo classification of protein kinases using CATH functional units. Biomolecules. 2023;13:277.
doi: 10.3390/biom13020277
pubmed: 36830646
pmcid: 9953599
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.
doi: 10.1038/sdata.2016.18
pubmed: 26978244
pmcid: 4792175
Djaffardjy M, Marchment G, Sebe C, Blanchet R, Bellajhame K, Gaignard A, et al. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Comput Struct Biotechnol J. 2023;21:2075–85.
doi: 10.1016/j.csbj.2023.03.003
pubmed: 36968012
pmcid: 10030817
Tommaso PD, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–9.
doi: 10.1038/nbt.3820
pubmed: 28398311
Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS ONE. 2017;12: e0177459.
doi: 10.1371/journal.pone.0177459
pubmed: 28494014
pmcid: 5426675
Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7: e1002195.
doi: 10.1371/journal.pcbi.1002195
pubmed: 22039361
pmcid: 3197634
Martin DMA, Miranda-Saavedra D, Barton GJ. Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases. Nucleic Acids Res. 2009;37(1):D244–50.
doi: 10.1093/nar/gkn834
pubmed: 18974176
Miranda-Saavedra D, Barton GJ. Classification and functional annotation of eukaryotic protein kinases. Proteins Struct Funct Bioinform. 2007;68:893–914.
doi: 10.1002/prot.21444
Altman RB, Jung TA, Klein TE, Dunker AK, Hunter L, Brown D, et al. Subfamily HMMS in functional genomics. Biocomput. 2005;2004:322–33.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
doi: 10.1016/S0022-2836(05)80360-2
pubmed: 2231712
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80.
doi: 10.1093/nar/22.22.4673
pubmed: 7984417
pmcid: 308517
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4.
doi: 10.1093/molbev/msaa015
pubmed: 32011700
pmcid: 7182206
Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33:1635–8.
doi: 10.1093/molbev/msw046
pubmed: 26921390
pmcid: 4868116
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–9.
doi: 10.1038/s41586-021-03819-2
pubmed: 34265844
pmcid: 8371605
van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol. 2024;42:243–6.
doi: 10.1038/s41587-023-01773-0
pubmed: 37156916
Epelboin Y, Quintric L, Guévélou E, Boudry P, Pichereau V, Corporeau C. The kinome of pacific oyster crassostrea gigas, its expression during development and in response to environmental factors. PLoS ONE. 2016;11: e0155435.
doi: 10.1371/journal.pone.0155435
pubmed: 27231950
pmcid: 4883820
Hindle MM, Martin SF, Noordally ZB, van Ooijen G, Barrios-Llerena ME, Simpson TI, et al. The reduced kinome of Ostreococcus tauri: core eukaryotic signalling components in a tractable model species. BMC Genom. 2014;15:640.
doi: 10.1186/1471-2164-15-640
Consortium TDT of LP, Blaxter M, Mieszkowska N, Palma FD, Holland P, Durbin R, et al. Sequence locally, think globally: the darwin tree of life project. Proc Natl Acad Sci. 2022;119:1.
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737–46.
doi: 10.1038/s41586-021-03451-0
pubmed: 33911273
pmcid: 8081667
Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, et al. Earth biogenome project: sequencing life for the future of life. Proc Natl Acad Sci. 2018;115:4325–33.
doi: 10.1073/pnas.1720115115
pubmed: 29686065
pmcid: 5924910