Rapid and accurate prediction of protein homo-oligomer symmetry with Seq2Symm.


Journal

Research square
Titre abrégé: Res Sq
Pays: United States
ID NLM: 101768035

Informations de publication

Date de publication:
26 Apr 2024
Historique:
medline: 15 5 2024
pubmed: 15 5 2024
entrez: 15 5 2024
Statut: epublish

Résumé

The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance of protein quaternary structure, there are few machine learning models that can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by training several classes of protein foundation models, including ESM-MSA, ESM2, and RoseTTAFold2, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods. It achieves an average PR-AUC of 0.48 and 0.44 across homo-oligomer symmetries on two different held-out test sets compared to 0.32 and 0.23 for the template-based method. Because Seq2Symm can rapidly predict homo-oligomer symmetries using a single sequence as input (~ 80,000 proteins/hour), we have applied it to 5 entire proteomes and ~ 3.5 million unlabeled protein sequences to identify patterns in protein assembly complexity across biological kingdoms and species.

Identifiants

pubmed: 38746169
doi: 10.21203/rs.3.rs-4215086/v1
pmc: PMC11092833
pii:
doi:

Types de publication

Preprint

Langues

eng

Auteurs

Classifications MeSH