Unattained geometric configurations of secondary structure elements in protein structural space.

Protein folds Protein structural space Secondary structure Secondary structure assignment

Journal

Journal of structural biology
ISSN: 1095-8657
Titre abrégé: J Struct Biol
Pays: United States
ID NLM: 9011206

Informations de publication

Date de publication:
09 2022
Historique:
received: 04 12 2021
revised: 14 05 2022
accepted: 17 05 2022
pubmed: 2 6 2022
medline: 9 9 2022
entrez: 1 6 2022
Statut: ppublish

Résumé

Discovery of new folds in the Protein Data Bank (PDB) has all but ceased. This could be viewed as evidence that all existing protein folds have been documented. Sampling bias has, however, been presented as an alternative explanation. Furthermore, although we may know of all protein folds that do exist, we may not have documented all protein folds that could exist. While addressing completeness in the context of entire protein structures is extremely difficult, they can be simplified in a number of ways. One such simplification is presented: considering protein structures as a series of α helices and β sheets and analysing the geometric relationships between these successive secondary structure elements (SSEs) through torsion angles, lengths and distances. We aimed to find out whether all substructures that could be formed by triplets of these successive SSEs were represented in the PDB. When SSEs were defined with the assignment program Promotif, a gap was identified in the represented torsion angles of helix-strand-strand substructures. This was not present when SSEs were defined with an alternative assignment program with a smaller minimum SSE length, DSSP. We also looked at representing proteins as one-dimensional sequences of SSE types and searched for underrepresented motifs. Completely absent motifs occurred more often than expected at random. If a gap in SSE substructure space exists that could be filled or if a physically possible SSE motif is absent, associated gaps in protein structure space are implied, meaning that the PDB as we know it may not be complete.

Identifiants

pubmed: 35649487
pii: S1047-8477(22)00040-5
doi: 10.1016/j.jsb.2022.107870
pii:
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

107870

Informations de copyright

Copyright © 2022 Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Janan Sykes (J)

School of Natural Sciences, University of Tasmania, Australia. Electronic address: janan.sykes@utas.edu.au.

Barbara Holland (B)

School of Natural Sciences, University of Tasmania, Australia.

Michael Charleston (M)

School of Natural Sciences, University of Tasmania, Australia.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH