A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
01 09 2020
01 09 2020
Historique:
received:
14
12
2019
accepted:
10
08
2020
entrez:
3
9
2020
pubmed:
3
9
2020
medline:
9
3
2021
Statut:
epublish
Résumé
Protein fold recognition plays a crucial role in discovering three-dimensional structure of proteins and protein functions. Several approaches have been employed for the prediction of protein folds. Some of these approaches are based on extracting features from protein sequences and using a strong classifier. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In recent years, finding an efficient technique for integrating discriminate features have been received advancing attention. In this study, we integrate Auto-Cross-Covariance and Separated dimer evolutionary feature extraction methods. The results' features are scored by Information gain to define and select several discriminated features. According to three benchmark datasets, DD, RDD ,and EDD, the results of the support vector machine show more than 6[Formula: see text] improvement in accuracy on these benchmark datasets.
Identifiants
pubmed: 32873824
doi: 10.1038/s41598-020-71172-x
pii: 10.1038/s41598-020-71172-x
pmc: PMC7463267
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
14368Références
Baker, M. S. et al. Accelerating the search for the missing proteins in the human proteome. Nat. Commun. 8, 1–13 (2017).
doi: 10.1038/s41467-016-0009-6
Yang, J.-Y., Peng, Z.-L. & Chen, X. Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinform. 11, S9 (2010).
doi: 10.1186/1471-2105-11-S1-S9
Alberts, B. et al.Essential cell Biology (Garland Science, 2013).
Ding, C. H. & Dubchak, I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001).
doi: 10.1093/bioinformatics/17.4.349
Taguchi, Y. & Gromiha, M. M. Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinform. 8, 404 (2007).
doi: 10.1186/1471-2105-8-404
Dehzangi, A. & Phon-Amnuaisuk, S. Fold prediction problem: the application of new physical and physicochemical-based features. Protein Pept. Lett. 18, 174–185 (2011).
doi: 10.2174/092986611794475101
Ghanty, P. & Pal, N. R. Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers. IEEE Trans. Nanobiosci. 8, 100–110 (2009).
doi: 10.1109/TNB.2009.2016488
Sharma, A., Lyons, J., Dehzangi, A. & Paliwal, K. K. A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320, 41–46 (2013).
doi: 10.1016/j.jtbi.2012.12.008
Saini, H. et al. Probabilistic expression of spatially varied amino acid dimers into general form of chou’s pseudo amino acid composition for protein fold recognition. J. Theor. Biol. 380, 291–298 (2015).
doi: 10.1016/j.jtbi.2015.05.030
Dong, Q., Zhou, S. & Guan, J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009).
doi: 10.1093/bioinformatics/btp500
Paliwal, K. K., Sharma, A., Lyons, J. & Dehzangi, A. A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans. Nanobiosci. 13, 44–50 (2014).
doi: 10.1109/TNB.2013.2296050
Dehzangi, A., Paliwal, K., Lyons, J., Sharma, A. & Sattar, A. A segmentation-based method to extract structural and evolutionary features for protein fold recognition. IEEE/ACM Trans. Comput. Biol. Bioinf. 11, 510–519 (2014).
doi: 10.1109/TCBB.2013.2296317
Cai, Y.-D., Liu, X.-J., Xu, X.-B. & Chou, K.-C. Prediction of protein structural classes by support vector machines. Comput. Chem. 26, 293–296 (2002).
doi: 10.1016/S0097-8485(01)00113-9
Taherzadeh, G., Yang, Y., Zhang, T., Liew, A.W.-C. & Zhou, Y. Sequence-based prediction of protein-peptide binding sites using support vector machine. J. Comput. Chem. 37, 1223–1229 (2016).
doi: 10.1002/jcc.24314
Anand, A., Pugalenthi, G. & Suganthan, P. Predicting protein structural class by svm with class-wise optimized features and decision probabilities. J. Theor. Biol. 253, 375–380 (2008).
doi: 10.1016/j.jtbi.2008.02.031
Ding, Y.-S. & Zhang, T.-L. Using chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit. Lett. 29, 1887–1892 (2008).
doi: 10.1016/j.patrec.2008.06.007
Dehzangi, A., Phon-Amnuaisuk, S. & Dehzangi, O. Using random forest for protein fold prediction problem: an empirical study. J. Inf. Sci. Eng. 26, 1941–1956 (2010).
Li, D., Ju, Y. & Zou, Q. Protein folds prediction with hierarchical structured svm. Curr. Proteom. 13, 79–85 (2016).
doi: 10.2174/157016461302160514000940
Xia, J., Peng, Z., Qi, D., Mu, H. & Yang, J. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier. Bioinformatics 33, 863–870 (2016).
Dubchak, I., Muchnik, I. B. & Kim, S.-H. Protein folding class predictor for scop: approach based on global descriptors. Ismb 104–107 (1997).
Raicar, G., Saini, H., Dehzangi, A., Lal, S. & Sharma, A. Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids. J. Theor. Biol. 402, 117–128 (2016).
doi: 10.1016/j.jtbi.2016.05.002
Lyons, J. et al. Protein fold recognition using hmm-hmm alignment and dynamic programming. J. Theor. Biol. 393, 67–74 (2016).
doi: 10.1016/j.jtbi.2015.12.018
Wei, L., Liao, M., Gao, X. & Zou, Q. Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans. Nanobiosci. 14, 649–659 (2015).
doi: 10.1109/TNB.2015.2450233
Liu, T., Geng, X., Zheng, X., Li, R. & Wang, J. Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids 42, 2243–2249 (2012).
doi: 10.1007/s00726-011-0964-5
Yan, K., Wen, J., Liu, J.-X., Xu, Y. & Liu, B. Protein fold recognition by combining support vector machines and pairwise sequence similarity scores. In IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020).
Jazayeri, N. & Sajedi, H. D. An algorithm based on dna-computing and vortex search algorithm for task scheduling problem. In Evolutionary Intelligence, 1–11 (2020).
Baldi, P. & Pollastri, G. The principled design of large-scale recursive neural network architectures-dag-rnns and the protein structure prediction problem. J. Mach. Learn. Res. 4, 575–602 (2003).
Jahandideh, S., Abdolmaleki, P., Jahandideh, M. & Asadabadi, E. B. Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys. Chem. 128, 87–93 (2007).
doi: 10.1016/j.bpc.2007.03.006
Yang, J.-Y. & Chen, X. Improving taxonomy-based protein fold recognition by using global and local features. Proteins: Struct., Funct., Bioinf. 79, 2053–2064 (2011).
doi: 10.1002/prot.23025
Refahi, M. S., Nasiri, J. A. & Ahadi, S. Ecg arrhythmia classification using least squares twin support vector machines. In Iranian Conference on Electrical Engineering (ICEE), 1619–1623 (IEEE, 2018).
Rahmanimanesh, M., Nasiri, J. A., Jalili, S. & Charkari, N. M. Adaptive three-phase support vector data description. Pattern Anal. Appl. 22, 491–504 (2019).
doi: 10.1007/s10044-017-0646-3
Jones, D. T. & Kandathil, S. M. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 34, 3308–3315 (2018).
doi: 10.1093/bioinformatics/bty341
Hou, J., Adhikari, B. & Cheng, J. Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34, 1295–1303 (2018).
doi: 10.1093/bioinformatics/btx780
Sudha, P., Ramyachitra, D. & Manikandan, P. Enhanced artificial neural network for protein fold recognition and structural class prediction. Gene Reports 12, 261–275 (2018).
doi: 10.1016/j.genrep.2018.07.012
Ghosh, K. K., Ghosh, S., Sen, S., Sarkar, R. & Maulik, U. A two-stage approach towards protein secondary structure classification. In Medical & Biological Engineering & Computing (2020).
Blast and multiple sequence alignment (msa) programs. https://viralzone.expasy.org/e_learning/alignments/description.html . Accessed: 2019-01-17.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
Zakeri, P., Simm, J., Arany, A., ElShal, S. & Moreau, Y. Gene prioritization using bayesian matrix factorization with genomic and phenotypic side information. Bioinformatics 34, i447–i456 (2018).
doi: 10.1093/bioinformatics/bty289
Zou, Q., Zeng, J., Cao, L. & Ji, R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173, 346–354 (2016).
doi: 10.1016/j.neucom.2014.12.123
Chen, K., Jiang, Y., Du, L. & Kurgan, L. Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J. Comput. Chem. 30, 163–172 (2009).
doi: 10.1002/jcc.21053
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Schölkopf, B., Smola, A. J., Bach, F. et al.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT press, 2002).
Hsu, C.-W. & Lin, C.-J. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13, 415–425 (2002).
doi: 10.1109/TNN.2002.1000139
Chang, C.-C. & Lin, C.-J. Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011).
Dobrovolska, O., Shumilina, E., Gladyshev, V. N. & Dikiy, A. Structural analysis of glutaredoxin domain of mus musculus thioredoxin glutathione reductase. PloS ONE 7, e52914 (2012).
doi: 10.1371/journal.pone.0052914
Hirt, R. P., Müller, S., Embley, T. M. & Coombs, G. H. The diversity and evolution of thioredoxin reductase: new perspectives. Trends Parasitol. 18, 302–308 (2002).
doi: 10.1016/S1471-4922(02)02293-6
Yan, K., Xu, Y., Fang, X., Zheng, C. & Liu, B. Protein fold recognition based on sparse representation based classification. Artif. Intell. Med. 79, 1–8 (2017).
doi: 10.1016/j.artmed.2017.03.006