ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.


Journal

Genes
ISSN: 2073-4425
Titre abrégé: Genes (Basel)
Pays: Switzerland
ID NLM: 101551097

Informations de publication

Date de publication:
21 Oct 2024
Historique:
received: 07 09 2024
revised: 07 10 2024
accepted: 18 10 2024
medline: 26 10 2024
pubmed: 26 10 2024
entrez: 26 10 2024
Statut: epublish

Résumé

Protein secondary structure prediction (PSSP) is a critical task in computational biology, pivotal for understanding protein function and advancing medical diagnostics. Recently, approaches that integrate multiple amino acid sequence features have gained significant attention in PSSP research. We aim to automatically extract additional features represented by evolutionary information from a large number of sequences while simultaneously incorporating positional information for more comprehensive sequence features. Additionally, we consider the interdependence between secondary structures during the prediction stage. To this end, we propose a deep neural network model, ILMCNet, which utilizes a language model and Conditional Random Field (CRF). Protein language models (PLMs) pre-trained on sequences from multiple large databases can provide sequence features that incorporate evolutionary information. ILMCNet uses positional encoding to ensure that the input features include positional information. To better utilize these features, we propose a hybrid network architecture that employs a Transformer Encoder to enhance features and integrates a feature extraction module combining a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory Network (BiLSTM). This design enables deep extraction of localized features while capturing global bidirectional information. In the prediction stage, ILMCNet employs CRF to capture the interdependencies between secondary structures. Experimental results on benchmark datasets such as CB513, TS115, NEW364, CASP11, and CASP12 demonstrate that the prediction performance of our method surpasses that of comparable approaches. This study proposes a new approach to PSSP research and is expected to play an important role in other protein-related research fields, such as protein tertiary structure prediction.

Sections du résumé

BACKGROUND BACKGROUND
Protein secondary structure prediction (PSSP) is a critical task in computational biology, pivotal for understanding protein function and advancing medical diagnostics. Recently, approaches that integrate multiple amino acid sequence features have gained significant attention in PSSP research.
OBJECTIVES OBJECTIVE
We aim to automatically extract additional features represented by evolutionary information from a large number of sequences while simultaneously incorporating positional information for more comprehensive sequence features. Additionally, we consider the interdependence between secondary structures during the prediction stage.
METHODS METHODS
To this end, we propose a deep neural network model, ILMCNet, which utilizes a language model and Conditional Random Field (CRF). Protein language models (PLMs) pre-trained on sequences from multiple large databases can provide sequence features that incorporate evolutionary information. ILMCNet uses positional encoding to ensure that the input features include positional information. To better utilize these features, we propose a hybrid network architecture that employs a Transformer Encoder to enhance features and integrates a feature extraction module combining a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory Network (BiLSTM). This design enables deep extraction of localized features while capturing global bidirectional information. In the prediction stage, ILMCNet employs CRF to capture the interdependencies between secondary structures.
RESULTS RESULTS
Experimental results on benchmark datasets such as CB513, TS115, NEW364, CASP11, and CASP12 demonstrate that the prediction performance of our method surpasses that of comparable approaches.
CONCLUSIONS CONCLUSIONS
This study proposes a new approach to PSSP research and is expected to play an important role in other protein-related research fields, such as protein tertiary structure prediction.

Identifiants

pubmed: 39457474
pii: genes15101350
doi: 10.3390/genes15101350
pii:
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Key R&D Program of Heilongjiang Province
ID : 2022ZX01A29
Organisme : National Natural Science Foundation of China
ID : 62272095

Auteurs

Benzhi Dong (B)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Hui Su (H)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Dali Xu (D)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Chang Hou (C)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Zheng Liu (Z)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Na Niu (N)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Guohua Wang (G)

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature

Classifications MeSH