ContactLib-ATT: a structure-based search engine for homologous proteins.


Journal

IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755

Informations de publication

Date de publication:
10 Aug 2022
Historique:
entrez: 10 8 2022
pubmed: 11 8 2022
medline: 11 8 2022
Statut: aheadofprint

Résumé

General-purpose protein structure embedding can be used for many important protein biology tasks, such as protein design, drug design and binding affinity prediction. Recent researches have shown that attention-based encoder layers are more suitable to learn high-level features. Based on this key observation, we propose a two-level general-purpose protein structure embedding neural network, called ContactLib-ATT. On local embedding level, a biologically more meaningful contact context is introduced. On global embedding level, attention-based encoder layers are employed for better global representation learning. Our general-purpose protein structure embedding framework is trained and tested on the SCOP40 2.07 dataset. As a result, ContactLib-ATT achieves a SCOP superfamily classification accuracy of 82.4% (i.e., 6.7% higher than state-of-the-art method). On the same dataset, ContactLib-ATT is used to simulate a structure-based search engine for remote homologous proteins, and our top-10 candidate list contains at least one remote homolog with a probability of 91.9%.

Identifiants

pubmed: 35947567
doi: 10.1109/TCBB.2022.3197802
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Auteurs

Classifications MeSH