AFP-LSE: Antifreeze Proteins Prediction Using Latent Space Encoding of Composition of k-Spaced Amino Acid Pairs.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
28 04 2020
28 04 2020
Historique:
received:
08
01
2020
accepted:
26
03
2020
entrez:
30
4
2020
pubmed:
30
4
2020
medline:
26
11
2020
Statut:
epublish
Résumé
Species living in extremely cold environments resist the freezing conditions through antifreeze proteins (AFPs). Apart from being essential proteins for various organisms living in sub-zero temperatures, AFPs have numerous applications in different industries. They possess very small resemblance to each other and cannot be easily identified using simple search algorithms such as BLAST and PSI-BLAST. Diverse AFPs found in fishes (Type I, II, III, IV and antifreeze glycoproteins (AFGPs)), are sub-types and show low sequence and structural similarity, making their accurate prediction challenging. Although several machine-learning methods have been proposed for the classification of AFPs, prediction methods that have greater reliability are required. In this paper, we propose a novel machine-learning-based approach for the prediction of AFP sequences using latent space learning through a deep auto-encoder method. For latent space pruning, we use the output of the auto-encoder with a deep neural network classifier to learn the non-linear mapping of the protein sequence descriptor and class label. The proposed method outperformed the existing methods, yielding excellent results in comparison. A comprehensive ablation study is performed, and the proposed method is evaluated in terms of widely used performance measures. In particular, the proposed method demonstrated a high Matthews correlation coefficient of 0.52, F-score of 0.49, and Youden's index of 0.81 on an independent test dataset, thereby outperforming the existing methods for AFP prediction.
Identifiants
pubmed: 32345989
doi: 10.1038/s41598-020-63259-2
pii: 10.1038/s41598-020-63259-2
pmc: PMC7188683
doi:
Substances chimiques
Antifreeze Proteins
0
Fish Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
7197Références
DeVries, A. L. & Wohlschlag, D. E. Freezing resistance in some antarctic fishes. Science 163, 1073–1075 (1969).
pubmed: 5764871
doi: 10.1126/science.163.3871.1073
pmcid: 5764871
Crevel, R., Fedyk, J. & Spurgeon, M. Antifreeze proteins: characteristics, occurrence and human exposure. Food and Chemical Toxicology 40, 899–903 (2002).
pubmed: 12065210
doi: 10.1016/S0278-6915(02)00042-X
Davies, P. L., Baardsnes, J., Kuiper, M. J. & Walker, V. K. Structure and function of antifreeze proteins. Philosophical Transactions of the Royal Society B: Biological Sciences 357, 927–935 (2002).
doi: 10.1098/rstb.2002.1081
Kuramochi, M. et al. Expression of ice-binding proteins in caenorhabditis elegans improves the survival rate upon cold shock and during freezing. Scientific reports 9, 6246 (2019).
pubmed: 31092839
pmcid: 6520345
doi: 10.1038/s41598-019-42650-8
Davies, P. L. & Hew, C. L. Biochemistry of fish antifreeze proteins. The FASEB Journal 4, 2460–2468 (1990).
pubmed: 2185972
doi: 10.1096/fasebj.4.8.2185972
pmcid: 2185972
Masud, M., Joardder, M. U. & Karim, M. Effect of hysteresis phenomena of cellular plant-based food materials on convection drying kinetics. Drying Technology 37, 1313–1320 (2019).
doi: 10.1080/07373937.2018.1498508
Yamazaki, A., Nishimiya, Y., Tsuda, S., Togashi, K. & Munehara, H. Freeze tolerance in sculpins (pisces; cottoidea) inhabiting north pacific and arctic oceans: Antifreeze activity and gene sequences of the antifreeze protein. Biomolecules 9, 139 (2019).
pmcid: 6523315
doi: 10.3390/biom9040139
de Menezes, G. C. A., Porto, B. A., Simões, J. C., Rosa, C. A. &Rosa, L. H. Fungi in snow and glacial ice of antarctica. In Fungi of Antarctica, 127–146 (Springer, 2019).
Arai, T., Fukami, D., Hoshino, T., Kondo, H. & Tsuda, S. Ice-binding proteins from the fungus antarctomyces psychrotrophicus possibly originate from two different bacteria through horizontal gene transfer. The FEBS journal 286, 946–962 (2019).
pubmed: 30548092
doi: 10.1111/febs.14725
pmcid: 30548092
Pe, P. P. W., Naing, A. H., Chung, M. Y., Park, K. I. & Kim, C. K. The role of antifreeze proteins in the regulation of genes involved in the response of hosta capitata to cold. 3 Biotech 9, 335 (2019).
pubmed: 31475087
doi: 10.1007/s13205-019-1859-5
Vu, H. M., Pennoyer, J. E., Ruiz, K. R., Portmann, P. & Duman, J. G. Beetle, dendroides canadensis, antifreeze proteins increased high temperature survivorship in transgenic fruit flies, drosophila melanogaster. Journal of insect physiology 112, 68–72 (2019).
pubmed: 30562493
doi: 10.1016/j.jinsphys.2018.12.004
Naing, A. H. & Kim, C. K. A brief review of applications of antifreeze proteins in cryopreservation and metabolic genetic engineering. 3 Biotech 9, 329 (2019).
pubmed: 31448185
doi: 10.1007/s13205-019-1861-y
Gong, S. et al. Evaluation of the antifreeze effects and its related mechanism of sericin peptides on the frozen dough of steamed potato bread. Journal of Food Processing and Preservation e14053 (2019).
Meister, K. et al. Molecular structure of a hyperactive antifreeze protein adsorbed to ice. The Journal of chemical physics 150, 131101 (2019).
pubmed: 30954062
doi: 10.1063/1.5090589
Kim, H. J. et al. Marine antifreeze proteins: structure, function, and application to cryopreservation as a potential cryoprotectant. Marine drugs 15, 27 (2017).
pmcid: 5334608
doi: 10.3390/md15020027
Jia, Z. & Davies, P. L. Antifreeze proteins: an unusual receptor–ligand interaction. Trends in biochemical sciences 27, 101–106 (2002).
pubmed: 11852248
doi: 10.1016/S0968-0004(01)02028-X
pmcid: 11852248
Graham, L. A., Marshall, C. B., Lin, F.-H., Campbell, R. L. & Davies, P. L. Hyperactive antifreeze protein from fish contains multiple ice-binding sites. Biochemistry 47, 2051–2063 (2008).
pubmed: 18225917
doi: 10.1021/bi7020316
pmcid: 18225917
Fletcher, G. L., Hew, C. L. & Davies, P. L. Antifreeze proteins of teleost fishes. Annual review of physiology 63, 359–390 (2001).
pubmed: 11181960
doi: 10.1146/annurev.physiol.63.1.359
pmcid: 11181960
Nath, A. & Subbiah, K. The role of pertinently diversified and balanced training as well as testing data sets in achieving the true performance of classifiers in predicting the antifreeze proteins. Neurocomputing 272, 294–305 (2018).
doi: 10.1016/j.neucom.2017.07.004
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of molecular biology 215, 403–410 (1990).
pubmed: 2231712
doi: 10.1016/S0022-2836(05)80360-2
Altschul, S. F. et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research 25, 3389–3402 (1997).
pubmed: 9254694
pmcid: 146917
doi: 10.1093/nar/25.17.3389
Kandaswamy, K. et al. AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived. Journal of Theoretical Biology 270, 56–62 (2011).
pubmed: 21056045
doi: 10.1016/j.jtbi.2010.10.037
Yu, C.-S. & Lu, C.-H. Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions. PloS one 6, e20445 (2011).
pubmed: 21655262
pmcid: 3105057
doi: 10.1371/journal.pone.0020445
Xiaowei, Z., Zhiqiang, M. & Minghao, Y. Using support vector machine and evolutionary profiles to predict antifreeze protein sequences. International Journal of Molecular Science 13, 2196–2207 (2012).
doi: 10.3390/ijms13022196
Mondal, S. & Pai, P. P. Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. Journal of theoretical biology 356, 30–35 (2014).
pubmed: 24732262
doi: 10.1016/j.jtbi.2014.04.006
Yang, R., Zhang, C., Gao, R. & Zhang, L. An effective antifreeze protein predictor with ensemble classifiers and comprehensive sequence descriptors. International journal of molecular sciences 16, 21191–21214 (2015).
pubmed: 26370959
pmcid: 4613249
doi: 10.3390/ijms160921191
Xiao, X., Hui, M. & Liu, Z. iafp-ense: an ensemble classifier for identifying antifreeze protein by incorporating grey model and pssm into pseaac. The Journal of membrane biology 249, 845–854 (2016).
pubmed: 27812737
doi: 10.1007/s00232-016-9935-9
Khan, S., Naseem, I., Togneri, R. & Bennamoun, M. Rafp-pred: Robust prediction of antifreeze proteins using localized analysis of n-peptide compositions. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15, 244–250 (2018).
pubmed: 28113406
doi: 10.1109/TCBB.2016.2617337
Pratiwi, R. et al. Cryoprotect: a web server for classifying antifreeze proteins from nonantifreeze proteins. Journal of Chemistry 2017 (2017).
Tyagi, S. & Mittal, S. Sampling approaches for imbalanced data classification problem in machine learning. In Proceedings of ICRIC 2019, 209–221 (Springer, 2020).
Krawczyk, B., Koziarski, M. & Wozniak, M. Radial-based oversampling for multiclass imbalanced data classification. IEEE transactions on neural networks and learning systems (2019).
Vuttipittayamongkol, P. & Elyan, E. Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Information Sciences 509, 47–70 (2020).
doi: 10.1016/j.ins.2019.08.062
Wu, M., Yang, Y., Wang, H. & Xu, Y. A deep learning method to more accurately recall known lysine acetylation sites. BMC bioinformatics 20, 49 (2019).
pubmed: 30674277
pmcid: 6343287
doi: 10.1186/s12859-019-2632-9
Fu, H., Yang, Y., Wang, X., Wang, H. & Xu, Y. Deepubi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC bioinformatics 20, 86 (2019).
pubmed: 30777029
pmcid: 6379983
doi: 10.1186/s12859-019-2677-9
Chen, D., Tian, X., Zhou, B. & Gao, J. Profold: Protein fold classification with additional structural features and a novel ensemble classifier. BioMed research international 2016 (2016).
Usman, M. & Lee, J. A. Afp-cksaap: Prediction of antifreeze proteins using composition of k-spaced amino acid pairs with deep neural network. In 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), 38–43 (IEEE, 2019).
Tang, B., Pan, Z., Yin, K. & Khateeb, A. Recent advances of deep learning in bioinformatics and computational biology. Frontiers in Genetics 10 (2019).
Li, F. et al. Deepcleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics 10 (2019).
Khan, S., Islam, N., Jan, Z., Din, I. U. & Rodrigues, J. J. C. A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognition Letters 125, 1–6 (2019).
doi: 10.1016/j.patrec.2019.03.022
Ng, A. et al. Sparse autoencoder. CS294A Lecture notes 72, 1–19 (2011).
Du, P., Wang, X., Xu, C. & Gao, Y. PseAAC-Builder: A cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Analytical biochemistry 425, 117–119 (2012).
pubmed: 22459120
doi: 10.1016/j.ab.2012.03.015
pmcid: 22459120
Kozuch, D. J., Stillinger, F. H. & Debenedetti, P. G. Combined molecular dynamics and neural network method for predicting protein antifreeze activity. Proceedings of the National Academy of Sciences 115, 13252–13257 (2018).
doi: 10.1073/pnas.1814945115
Ju, Z. & Wang, S.-Y. Prediction of citrullination sites by incorporating k-spaced amino acid pairs into chou’s general pseudo amino acid composition. Gene 664, 78–83 (2018).
pubmed: 29694908
doi: 10.1016/j.gene.2018.04.055
pmcid: 29694908
Ju, Z. & Wang, S.-Y. Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via chou’s 5-steps rule and general pseudo components. Genomics (2019).
Chen, J., Zhao, J., Yang, S., Chen, Z. & Zhang, Z. Prediction of protein ubiquitination sites in arabidopsis thaliana. Current Bioinformatics 14, 614–620 (2019).
doi: 10.2174/1574893614666190311141647
Chen, Z. et al. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PloS one 6, e22930 (2011).
pubmed: 21829559
pmcid: 3146527
doi: 10.1371/journal.pone.0022930
Chen, Q.-Y., Tang, J. & Du, P.-F. Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features. Molecular BioSystems 13, 874–882 (2017).
pubmed: 28396891
doi: 10.1039/C6MB00875E
pmcid: 28396891
Ringnér, M. What is principal component analysis? Nature biotechnology 26, 303 (2008).
pubmed: 18327243
doi: 10.1038/nbt0308-303
pmcid: 18327243
Yitzhaki, S. et al. On an extension of the gini inequality index. International economic review 24, 617–628 (1983).
doi: 10.2307/2648789
Naseem, I., Khan, S., Togneri, R. & Bennamoun, M. Ecmsrc: A sparse learning approach for the prediction of extracellular matrix proteins. Current Bioinformatics 12, 361–368 (2017).
doi: 10.2174/1574893611666151215213508
Gogna, A. & Majumdar, A. Discriminative autoencoder for feature extraction: Application to character recognition. Neural Processing Letters 49, 1723–1735 (2019).
doi: 10.1007/s11063-018-9894-5
Sun, L. et al. Unsupervised eeg feature extraction based on echo state network. Information Sciences 475, 1–17 (2019).
doi: 10.1016/j.ins.2018.09.057
Bhowick, D., Gupta, D. K., Maiti, S. & Shankar, U. Stacked autoencoders based machine learning for noise reduction and signal reconstruction in geophysical data. arXiv preprint arXiv:1907.03278 (2019).
Yoon, Y. H., Khan, S., Huh, J. & Ye, J. C. Efficient b-mode ultrasound image reconstruction from sub-sampled rf data using deep learning. IEEE transactions on medical imaging 38, 325–336 (2018).
pubmed: 30106712
doi: 10.1109/TMI.2018.2864821
Tieleman, T. & Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 26–31 (2012).
Hunter, S. et al. Interpro: the integrative protein signature database. Nucleic acids research 37, D211–D215 (2009).
pubmed: 18940856
doi: 10.1093/nar/gkn785
Consortium, T. U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research 47, D506–D515 (2018).
doi: 10.1093/nar/gky1049
Boeckmann, B. et al. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic acids research 31, 365–370 (2003).
pubmed: 12520024
pmcid: 165542
doi: 10.1093/nar/gkg095
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. Journal of Big Data 6, 27 (2019).
doi: 10.1186/s40537-019-0192-5
Fernandez-Recio, J., Totrov, M., Skorodumov, C. & Abagyan, R. Optimal docking area: a new method for predicting protein–protein interaction sites. PROTEINS: Structure, Function, and bioinformatics 58, 134–143 (2005).
doi: 10.1002/prot.20285
Jia, J., Liu, Z., Xiao, X., Liu, B. & Chou, K.-C. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. Journal of Biomolecular Structure and Dynamics 34, 1946–1961 (2016).
pubmed: 26375780
doi: 10.1080/07391102.2015.1095116
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell rna-seq denoising using a deep count autoencoder. Nature communications 10, 1–14 (2019).
doi: 10.1038/s41467-018-07931-2
Strack, R. Building up bioluminescence. Nature methods 16, 20–20 (2019).
pubmed: 30573844
doi: 10.1038/s41592-018-0274-x
Garcia-Garcera, M. & Rocha, E. P. Community diversity and habitat structure shape the repertoire of extracellular proteins in bacteria. Nature Communications 11, 1–11 (2020).
doi: 10.1038/s41467-020-14572-x