A two-stage approach towards protein secondary structure classification.


Journal

Medical & biological engineering & computing
ISSN: 1741-0444
Titre abrégé: Med Biol Eng Comput
Pays: United States
ID NLM: 7704869

Informations de publication

Date de publication:
Aug 2020
Historique:
received: 14 09 2019
accepted: 20 05 2020
pubmed: 31 5 2020
medline: 20 5 2021
entrez: 31 5 2020
Statut: ppublish

Résumé

Protein secondary structure (PSS) describes the local folded structures which get formed inside a polypeptide due to interactions among atoms of the backbone. Generally, globular proteins are divided into four classes, namely all-α, all-β, α + β, and α/β. As nearly 90% of proteins fall into the said four classes, these are mostly considered for the purpose of computational classification of proteins. Classification of PSS is important for different biological functions that include protein fold recognition, tertiary structure prediction, prediction of DNA-binding sites, and reduction of the conformation search space among others. In this paper, we have proposed a machine learning-based model for secondary structure classification of proteins into four classes: all-α, all-β, α + β, and α/β. In doing so, we have considered both sequence-based and structure-based features. At first, mutual information (MI), a filter-based feature selection method, is used to remove the redundant features, and then these selected features are used to train three different classifiers-random forest, K-nearest neighbor (KNN), and multi-layer perceptron (MLP). After that, some standard classifier combination approaches are applied to integrate the decision made by the said classifiers and it has been found that weighted product rule performs the best among all. The overall accuracies obtained using the proposed model on the four standard datasets, namely 640, 1189, 25pdb, and fc699 are 86.89%, 92.93%, 91.38%, and 94.87% respectively. The proposed model outperforms some state-of-the-art methods considered here for comparison. Significantly high classification accuracy produced by our proposed model on four datasets is attributed to the development of a comprehensive feature set (by eliminating redundant features through feature selection technique) which is then passed through an ensemble consists of three different classifiers. Assigning different weights to the outcome of different classifiers thus proved to be useful in designing the model for predicting the secondary structure of proteins based on its sequence-based and structure-based features. Graphical abstract.

Identifiants

pubmed: 32472446
doi: 10.1007/s11517-020-02194-w
pii: 10.1007/s11517-020-02194-w
doi:

Substances chimiques

Peptides 0
Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1723-1737

Auteurs

Kushal Kanti Ghosh (KK)

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India. kushalkanti1999@gmail.com.

Soulib Ghosh (S)

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.

Sagnik Sen (S)

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.

Ram Sarkar (R)

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.

Ujjwal Maulik (U)

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH