FilterDCA: Interpretable supervised contact prediction using inter-domain coevolution.


Journal

PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922

Informations de publication

Date de publication:
10 2020
Historique:
received: 20 12 2019
accepted: 20 08 2020
revised: 21 10 2020
pubmed: 10 10 2020
medline: 28 1 2021
entrez: 9 10 2020
Statut: epublish

Résumé

Predicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, improving performance over standard coevolutionary analysis, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA.

Identifiants

pubmed: 33035205
doi: 10.1371/journal.pcbi.1007621
pii: PCOMPBIOL-D-19-02208
pmc: PMC7577475
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1007621

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301
pubmed: 22106262
Proc Natl Acad Sci U S A. 2017 Aug 22;114(34):9122-9127
pubmed: 28784799
Bioinformatics. 2018 Oct 1;34(19):3308-3315
pubmed: 29718112
Proteins. 2011 Apr;79(4):1061-78
pubmed: 21268112
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707
pubmed: 23410359
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
Bioinformatics. 2019 Aug 1;35(15):2677-2679
pubmed: 30590407
Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9
pubmed: 24009338
Bioinformatics. 2012 Jan 15;28(2):184-90
pubmed: 22101153
Proc Natl Acad Sci U S A. 2017 Mar 28;114(13):E2662-E2671
pubmed: 28289198
Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):E1733-42
pubmed: 22670053
Science. 2017 Jan 20;355(6322):294-298
pubmed: 28104891
Proc Natl Acad Sci U S A. 2009 Jan 6;106(1):67-72
pubmed: 19116270
Proteins. 2019 Dec;87(12):1141-1148
pubmed: 31602685
PLoS Comput Biol. 2014 Nov 06;10(11):e1003889
pubmed: 25375897
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324
pubmed: 28056090
Cell. 2012 Jun 22;149(7):1607-21
pubmed: 22579045
Nature. 2020 Jan;577(7792):706-710
pubmed: 31942072
Methods Mol Biol. 2017;1607:627-641
pubmed: 28573592
Elife. 2014 May 01;3:e02030
pubmed: 24842992
Nucleic Acids Res. 2014 Jan;42(Database issue):D374-9
pubmed: 24081580
Elife. 2017 May 12;6:
pubmed: 28498104
Bioinformatics. 2018 Sep 1;34(17):i802-i810
pubmed: 30423091

Auteurs

Maureen Muscat (M)

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative - LCQB, 75005 Paris, France.

Giancarlo Croce (G)

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative - LCQB, 75005 Paris, France.

Edoardo Sarti (E)

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative - LCQB, 75005 Paris, France.

Martin Weigt (M)

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative - LCQB, 75005 Paris, France.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
alpha-Synuclein Humans Animals Mice Lewy Body Disease

Classifications MeSH