CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds.

AlphaFold2 CATH fold protein domain protein structure prediction superfamily

Journal

Journal of molecular biology
ISSN: 1089-8638
Titre abrégé: J Mol Biol
Pays: Netherlands
ID NLM: 2985088R

Informations de publication

Date de publication:
26 Mar 2024
Historique:
received: 31 01 2024
revised: 20 03 2024
accepted: 22 03 2024
medline: 29 3 2024
pubmed: 29 3 2024
entrez: 28 3 2024
Statut: aheadofprint

Résumé

CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.

Identifiants

pubmed: 38548261
pii: S0022-2836(24)00146-3
doi: 10.1016/j.jmb.2024.168551
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

168551

Informations de copyright

Copyright © 2024 The Author(s). Published by Elsevier Ltd.. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Vaishali P Waman (VP)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Nicola Bordin (N)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Rachel Alcraft (R)

Advanced Research Computing Centre, University College London, London, United Kingdom.

Robert Vickerstaff (R)

Advanced Research Computing Centre, University College London, London, United Kingdom.

Clemens Rauer (C)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Qian Chan (Q)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Ian Sillitoe (I)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Hazuki Yamamori (H)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom.

Christine Orengo (C)

Institute of Structural and Molecular Biology, University College London, London, United Kingdom. Electronic address: c.orengo@ucl.ac.uk.

Classifications MeSH