PIPENN: protein interface prediction from sequence with an ensemble of neural nets.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
12 04 2022
Historique:
received: 03 09 2021
revised: 16 01 2022
accepted: 04 02 2022
pubmed: 13 2 2022
medline: 3 2 2023
entrez: 12 2 2022
Statut: ppublish

Résumé

The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. Source code and datasets are available at https://github.com/ibivu/pipenn/. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 35150231
pii: 6527621
doi: 10.1093/bioinformatics/btac071
pmc: PMC9004643
doi:

Substances chimiques

Proteins 0
Nucleotides 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2111-2118

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press.

Auteurs

Bas Stringer (B)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Hans de Ferrante (H)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Sanne Abeln (S)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Jaap Heringa (J)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

K Anton Feenstra (KA)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Reza Haydarlou (R)

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH