Using deep-learning predictions reveals a large number of register errors in PDB depositions.

AlphaFold2 Protein Data Bank deep learning register errors structure validation

Journal

IUCrJ
ISSN: 2052-2525
Titre abrégé: IUCrJ
Pays: England
ID NLM: 101623101

Informations de publication

Date de publication:
01 Nov 2024
Historique:
medline: 13 10 2024
pubmed: 13 10 2024
entrez: 10 10 2024
Statut: aheadofprint

Résumé

The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map-model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3-5 Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.

Identifiants

pubmed: 39387575
pii: S2052252524009114
doi: 10.1107/S2052252524009114
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/S007105/1
Pays : United Kingdom

Informations de copyright

open access.

Auteurs

Filomeno Sánchez Rodríguez (F)

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom.

Adam J Simpkin (AJ)

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom.

Grzegorz Chojnowski (G)

European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany.

Ronan M Keegan (RM)

UKRI-STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.

Daniel J Rigden (DJ)

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom.

Classifications MeSH