PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.

Bioinformatics Computational biology Deep learning Machine learning Photosynthesis Prediction server

Journal

Plant molecular biology
ISSN: 1573-5028
Titre abrégé: Plant Mol Biol
Pays: Netherlands
ID NLM: 9106343

Informations de publication

Date de publication:
24 Sep 2024
Historique:
received: 16 02 2024
accepted: 04 09 2024
medline: 24 9 2024
pubmed: 24 9 2024
entrez: 24 9 2024
Statut: epublish

Résumé

Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C

Identifiants

pubmed: 39316155
doi: 10.1007/s11103-024-01500-6
pii: 10.1007/s11103-024-01500-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

106

Informations de copyright

© 2024. The Author(s), under exclusive licence to Springer Nature B.V.

Références

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Amerifar S, Norouzi M, Ghandi M (2022) A tool for feature extraction from biological sequences. Brief Bioinform 23:bbac108. https://doi.org/10.1093/bib/bbac108
doi: 10.1093/bib/bbac108 pubmed: 35383372
Ashkenazi S, Snir R, Ofran Y (2012) Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins. Bioinformatics 28:3203–3210. https://doi.org/10.1093/bioinformatics/bts608
Aubry S, Brown NJ, Hibberd JM (2011) The role of proteins in C3 plants prior to their recruitment into the C4 pathway. J Exp Bot 62:3049–3059. https://doi.org/10.1093/jxb/err012
doi: 10.1093/jxb/err012 pubmed: 21321052
Bailey KJ, Gray JE, Walker RP, Leegood RC (2007) Coordinate regulation of Phosphoenolpyruvate Carboxylase and Phosphoenolpyruvate Carboxykinase by Light and CO2 during C4 photosynthesis. Plant Physiol 144:479–486. https://doi.org/10.1104/pp.106.093013
doi: 10.1104/pp.106.093013 pubmed: 17337522 pmcid: 1913779
Batista-Silva W, da Fonseca-Pereira P, Martins AO, Zsögön A, Nunes-Nesi A, Araújo WL (2020) Engineering Improved Photosynthesis in the era of Synthetic Biology. Plant Commun 1:100032. https://doi.org/10.1016/j.xplc.2020.100032
doi: 10.1016/j.xplc.2020.100032 pubmed: 33367233 pmcid: 7747996
Brahma S (2018) Improved Sentence modeling using Suffix bidirectional LSTM. Learning, arXiv. https://arXiv.org/1805.07340
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140. https://doi.org/10.1007/BF00058655
doi: 10.1007/BF00058655
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
doi: 10.1023/A:1010933404324
Caffarri S, Tibiletti T, Jennings RC, Santabarbara S (2014) A comparison between Plant Photosystem I and Photosystem II Architecture and Functioning. Curr Protein Pept Sci 15:296–331. https://doi.org/10.2174/1389203715666140327102218
doi: 10.2174/1389203715666140327102218 pubmed: 24678674 pmcid: 4030627
Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, pp 785–794
Chen K, Kurgan LA, Ruan J (2007) Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct Biol 7:25. https://doi.org/10.1186/1472-6807-7-25
doi: 10.1186/1472-6807-7-25 pubmed: 17437643 pmcid: 1863424
Chen K, Jiang Y, Du L, Kurgan L (2009) Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J Comput Chem 30:163–172. https://doi.org/10.1002/jcc.2105
doi: 10.1002/jcc.2105 pubmed: 18567007
Chen R-C, Dewi C, Huang S-W, Caraka RE (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7:52. https://doi.org/10.1186/s40537-020-00327-4
doi: 10.1186/s40537-020-00327-4
Chen L, Yang Y, Zhao Z, Lu S, Lu Q, Cui C, Parry MAJ, Hu Y-G (2023) Genome-wide identification and comparative analyses of key genes involved in C4 photosynthesis in five main gramineous crops. Frontiers in Plant Science 14
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255. https://doi.org/10.1002/prot.1035
doi: 10.1002/prot.1035 pubmed: 11288174
Chou K-C, Cai Y-D (2004) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320:1236–1239. https://doi.org/10.1016/j.bbrc.2004.06.073
doi: 10.1016/j.bbrc.2004.06.073 pubmed: 15249222
Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3. https://doi.org/10.1186/1471-2105-7-3
doi: 10.1186/1471-2105-7-3 pubmed: 16398926 pmcid: 1363357
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Eaton-Rye JJ, Sobotka R (2017) Editorial: Assembly of the Photosystem II membrane-protein complex of Oxygenic Photosynthesis. Frontiers in Plant Science 8
Freund Y, Schapire RE (1999) A short introduction to boosting. J Japanese Soc Artif Intell 14(5):771–780
Han LY, Zheng CJ, Lin HH, Cui J, Li H, Zhang HL, Tang ZQ, Chen YZ (2005) Prediction of functional class of novel plant proteins by a statistical learning method. New Phytol 168:109–121. https://doi.org/10.1111/j.1469-8137.2005.01482.x
doi: 10.1111/j.1469-8137.2005.01482.x pubmed: 16159326
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 770–778
Hibberd JM, Sheehy JE, Langdale JA (2008) Using C4 photosynthesis to increase the yield of rice-rationale and feasibility. Curr Opin Plant Biol 11:228–231. https://doi.org/10.1016/j.pbi.2007.11.002
doi: 10.1016/j.pbi.2007.11.002 pubmed: 18203653
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
doi: 10.1162/neco.1997.9.8.1735 pubmed: 9377276
Huang Y, Niu B, Gao Y et al (2010) CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682. https://doi.org/10.1093/bioinformatics/btq003
doi: 10.1093/bioinformatics/btq003 pubmed: 20053844 pmcid: 2828112
Huang M-L, Hung Y-H, Lee WM, Li RK, Jiang B-R (2014) SVM-RFE based feature selection and Taguchi Parameters Optimization for Multiclass SVM Classifier. ScientificWorldJournal 2014:795624. https://doi.org/10.1155/2014/795624
doi: 10.1155/2014/795624 pubmed: 25295306 pmcid: 4175386
Jiang G, Wang W (2017) Error estimation based on variance analysis of k-fold cross-validation. Pattern Recogn 69:94–106. https://doi.org/10.1016/j.patcog.2017.03.025
doi: 10.1016/j.patcog.2017.03.025
Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374. https://doi.org/10.1093/nar/28.1.374
doi: 10.1093/nar/28.1.374 pubmed: 10592278 pmcid: 102411
Ke G, Meng Q, Finley T et al (2017) LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, pp 3149–3157
Kim Y (2014) Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751
Kubis A, Bar-Even A (2019) Synthetic biology approaches for improving photosynthesis. J Exp Bot 70:1425–1433. https://doi.org/10.1093/jxb/erz029
doi: 10.1093/jxb/erz029 pubmed: 30715460 pmcid: 6432428
Kulmanov M, Hoehndorf R (2020) DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36:422–429. https://doi.org/10.1093/bioinformatics/btz595
doi: 10.1093/bioinformatics/btz595 pubmed: 31350877
Kulmanov M, Khan MA, Hoehndorf R (2018) DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 34:660–668. https://doi.org/10.1093/bioinformatics/btx624
doi: 10.1093/bioinformatics/btx624 pubmed: 29028931
Li YH, Xu JY, Tao L, Li XF, Li S, Zeng X, Chen SY, Zhang P, Qin C, Zhang C, Chen Z, Zhu F, Chen YZ (2016) SVM-Prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PLoS ONE 11:e0155290. https://doi.org/10.1371/journal.pone.0155290
doi: 10.1371/journal.pone.0155290 pubmed: 27525735 pmcid: 4985167
Long SP, Zhu X-G, Naidu SL, Ort DR (2006) Can improvement in photosynthesis increase crop yields? Plant Cell Environ 29:315–330. https://doi.org/10.1111/j.1365-3040.2005.01493.x
doi: 10.1111/j.1365-3040.2005.01493.x pubmed: 17080588
Matsuoka M, Furbank RT, Fukayama H, Miyao M (2001) MOLECULAR ENGINEERING OF C4 PHOTOSYNTHESIS. Annu Rev Plant Physiol Plant Mol Biol 52:297–314. https://doi.org/10.1146/annurev.arplant.52.1.297
doi: 10.1146/annurev.arplant.52.1.297 pubmed: 11337400
Meher PK, Sahu TK, Saini V, Rao AR (2017) Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 7:42362. https://doi.org/10.1038/srep42362
doi: 10.1038/srep42362 pubmed: 28205576 pmcid: 5304217
Muhie SH (2022) Optimization of photosynthesis for sustainable crop production. CABI Agric Bioscience 3:50. https://doi.org/10.1186/s43170-022-00117-3
doi: 10.1186/s43170-022-00117-3
Nagashima S, Nagashima KVP (2013) Chapter Five - Comparison of Photosynthesis Gene Clusters Retrieved from Total Genome Sequences of Purple Bacteria. In: Beatty JT (ed) Advances in Botanical Research. Academic Press, pp 151–178
Nowicka B (2019) Target genes for plant productivity improvement. J Biotechnol 298:21–34. https://doi.org/10.1016/j.jbiotec.2019.04.008
doi: 10.1016/j.jbiotec.2019.04.008 pubmed: 30978366
Nowicka B, Ciura J, Szymańska R, Kruk J (2018) Improving photosynthesis, plant productivity and abiotic stress tolerance– current trends and future perspectives. J Plant Physiol 231:415–433. https://doi.org/10.1016/j.jplph.2018.10.022
doi: 10.1016/j.jplph.2018.10.022 pubmed: 30412849
Orr DJ, Pereira AM, Pereira PdaF, Pereira-Lima ÍA, Zsögön A, Araújo WL (2017) Engineering photosynthesis: progress and perspectives
Paul MJ (2021) Improving photosynthetic metabolism for crop yields: what is going to work? Frontiers in Plant Science 12
Pradhan UK, Meher PK, Naha S et al (2023) PlDBPred: a novel computational model for discovery of DNA binding proteins in plants. Brief Bioinform 24:bbac483. https://doi.org/10.1093/bib/bbac483
doi: 10.1093/bib/bbac483 pubmed: 36416116
Roberts K, Granum E, Leegood RC, Raven JA (2007) C3 and C4 pathways of photosynthetic Carbon Assimilation in Marine Diatoms are under genetic, not environmental, control. Plant Physiol 145:230–235. https://doi.org/10.1104/pp.107.102616
doi: 10.1104/pp.107.102616 pubmed: 17644625 pmcid: 1976569
Robles-Zazueta CA, Pinto F, Molero G, Foulkes MJ, Reynolds MP, Murchie EH (2022) Prediction of photosynthetic, Biophysical, and biochemical traits in wheat canopies to reduce the phenotyping bottleneck. Frontiers in Plant Science 13
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. https://doi.org/10.1093/bioinformatics/btm344
doi: 10.1093/bioinformatics/btm344 pubmed: 17720704
Sage RF (2004) The evolution of C4 photosynthesis. New Phytol 161:341–370. https://doi.org/10.1111/j.1469-8137.2004.00974.x
doi: 10.1111/j.1469-8137.2004.00974.x pubmed: 33873498
Sage RF, Christin P-A, Edwards EJ (2011) The C4 plant lineages of planet earth. J Exp Bot 62:3155–3169. https://doi.org/10.1093/jxb/err048
doi: 10.1093/jxb/err048 pubmed: 21414957
Sandri M, Zuccolotto P (2008) A Bias correction algorithm for the Gini Variable Importance measure in classification trees. J Comput Graphical Stat 17:611–628. https://doi.org/10.1198/106186008X344522
doi: 10.1198/106186008X344522
Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M (2020) Photosynthetic protein classification using genome neighborhood-based machine learning feature. Sci Rep 10:7108. https://doi.org/10.1038/s41598-020-64053-w
doi: 10.1038/s41598-020-64053-w pubmed: 32346070 pmcid: 7189237
Saravanan V, Gautham N (2015) Harnessing Computational Biology for exact Linear B-Cell Epitope Prediction: a novel amino acid composition-based feature descriptor. OMICS 19:648–658. https://doi.org/10.1089/omi.2015.0095
doi: 10.1089/omi.2015.0095 pubmed: 26406767
Schneider G, Wrede P (1994) The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. Biophys J 66:335–344. https://doi.org/10.1016/s0006-3495(94)80782-9
doi: 10.1016/s0006-3495(94)80782-9 pubmed: 8161687 pmcid: 1275700
Shevela D, Kern JF, Govindjee G, Messinger J (2023) Solar energy conversion by photosystem II: principles and structures. Photosynth Res 156:279–307. https://doi.org/10.1007/s11120-022-00991-y
doi: 10.1007/s11120-022-00991-y pubmed: 36826741 pmcid: 10203033
Sikander R, Wang Y, Ghulam A, Wu X (2021) Identification of enzymes-specific protein domain based on DDE, and convolutional neural network. Front Genet 12:759384. https://doi.org/10.3389/fgene.2021.759384
doi: 10.3389/fgene.2021.759384 pubmed: 34917128 pmcid: 8670239
Simkin AJ, López-Calcagno PE, Raines CA (2019) Feeding the world: improving photosynthetic efficiency for sustainable crop production. J Exp Bot 70:1119–1140. https://doi.org/10.1093/jxb/ery445
doi: 10.1093/jxb/ery445 pubmed: 30772919 pmcid: 6395887
South PF, Cavanagh AP, Liu HW, Ort DR (2019) Synthetic glycolate metabolism pathways stimulate crop growth and productivity in the field. Science 363:eaat9077. https://doi.org/10.1126/science.aat9077
doi: 10.1126/science.aat9077 pubmed: 30606819
The UniProt Consortium (2023) UniProt: the Universal protein knowledgebase in 2023. Nucleic Acids Res 51:D523–D531. https://doi.org/10.1093/nar/gkac1052
doi: 10.1093/nar/gkac1052
Vapnik V (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Vasylenko T, Liou Y-F, Chen H-A, Charoenkwan P, Huang H-L, Ho S-Y (2015) SCMPSP: prediction and characterization of photosynthetic proteins based on a scoring card method. BMC Bioinformatics 16:S8. https://doi.org/10.1186/1471-2105-16-S1-S8
doi: 10.1186/1471-2105-16-S1-S8 pubmed: 25708243 pmcid: 4331707
Wang Y, Dai X, Fu D, Li P, Du B (2022) PGD: a machine learning-based photosynthetic-related gene detection approach. BMC Bioinformatics 23:183. https://doi.org/10.1186/s12859-022-04722-x
doi: 10.1186/s12859-022-04722-x pubmed: 35581553 pmcid: 9112524
Wegener KM, Welsh EA, Thornton LE, Keren N, Jacobs JM, Hixson KK, Monroe ME, Camp DG, Smith RD, Pakrasi HB (2008) High sensitivity proteomics assisted discovery of a novel operon involved in the assembly of photosystem II, a membrane protein complex. J Biol Chem 283:27829–27837. https://doi.org/10.1074/jbc.M803918200
doi: 10.1074/jbc.M803918200 pubmed: 18693241
Wei L, Zhou C, Chen H, Song J, Su R (2018) ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34:4007–4016. https://doi.org/10.1093/bioinformatics/bty451
doi: 10.1093/bioinformatics/bty451 pubmed: 29868903 pmcid: 6247924
Yin W, Schütze H, Xiang B, Zhou B (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans Association Comput Linguistics 4:259–272. https://doi.org/10.1162/tacl_a_00097
doi: 10.1162/tacl_a_00097
Yu N, Yu Z, Pan Y (2017) A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 18:511. https://doi.org/10.1186/s12859-017-1922-3
doi: 10.1186/s12859-017-1922-3 pubmed: 29244011 pmcid: 5731497
Zhu X-G, Long SP, Ort DR (2010) Improving photosynthetic efficiency for greater yield. Annu Rev Plant Biol 61:235–261. https://doi.org/10.1146/annurev-arplant-042809-112206
doi: 10.1146/annurev-arplant-042809-112206 pubmed: 20192734

Auteurs

Prabina Kumar Meher (PK)

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India. prabina.meher@icar.gov.in.

Upendra Kumar Pradhan (UK)

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.

Padma Lochan Sethi (PL)

Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, 751003, Odisha, India.

Sanchita Naha (S)

Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.

Ajit Gupta (A)

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.

Rajender Parsad (R)

ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.

Classifications MeSH