Construct a variable-length fragment library for de novo protein structure prediction.

de novo protein structure prediction fragment library hidden Markov model secondary structure

Journal

Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837

Informations de publication

Date de publication:
13 05 2022
Historique:
received: 03 01 2022
revised: 10 02 2022
accepted: 20 02 2022
pubmed: 15 3 2022
medline: 24 5 2022
entrez: 14 3 2022
Statut: ppublish

Résumé

Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.

Identifiants

pubmed: 35284936
pii: 6547572
doi: 10.1093/bib/bbac086
pii:
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Auteurs

Qiongqiong Feng (Q)

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

Minghua Hou (M)

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

Jun Liu (J)

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

Kailong Zhao (K)

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

Guijun Zhang (G)

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Classifications MeSH