FunFam protein families improve residue level molecular function prediction.

Binding residue prediction CATH Functional families Protein binding sites Protein families Protein function

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
18 Jul 2019
Historique:
received: 18 04 2019
accepted: 09 07 2019
entrez: 20 7 2019
pubmed: 20 7 2019
medline: 14 9 2019
Statut: epublish

Résumé

The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Sections du résumé

BACKGROUND BACKGROUND
The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues.
RESULTS RESULTS
FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold.
CONCLUSIONS CONCLUSIONS
The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Identifiants

pubmed: 31319797
doi: 10.1186/s12859-019-2988-x
pii: 10.1186/s12859-019-2988-x
pmc: PMC6639920
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

400

Références

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W469-73
pubmed: 20513649
Bioinformatics. 2007 Nov 1;23(21):2947-8
pubmed: 17846036
Genome Biol. 2019 Nov 19;20(1):244
pubmed: 31744546
Proteins. 1997 Jul;28(3):405-20
pubmed: 9223186
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Nucleic Acids Res. 2015 Jul 1;43(W1):W30-8
pubmed: 25943547
Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9
pubmed: 23203869
Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8
pubmed: 23203873
Bioinformatics. 2018 Jul 1;34(13):i304-i312
pubmed: 29950013
Elife. 2014 Sep 25;3:
pubmed: 25255213
PLoS Comput Biol. 2017 Mar 27;13(3):e1005462
pubmed: 28346509
Structure. 1997 Aug 15;5(8):1093-108
pubmed: 9309224
J Mol Biol. 1995 Apr 7;247(4):536-40
pubmed: 7723011
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
Comp Funct Genomics. 2003;4(4):410-5
pubmed: 18629088
Proteins. 2001;Suppl 5:192-9
pubmed: 11835497
Brief Bioinform. 2002 Sep;3(3):265-74
pubmed: 12230035
PLoS Comput Biol. 2017 Jun 29;13(6):e1005522
pubmed: 28662117
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Structure. 2009 Jun 10;17(6):869-81
pubmed: 19523904
Nucleic Acids Res. 2000 Jan 1;28(1):304-5
pubmed: 10592255
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W539-44
pubmed: 19465386
Proteins. 2009;77 Suppl 9:1-4
pubmed: 19774620
Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7
pubmed: 23161676
Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360
pubmed: 30398656
Cell. 2012 Jun 22;149(7):1607-21
pubmed: 22579045
Nucleic Acids Res. 2017 Jan 4;45(D1):D289-D295
pubmed: 27899584
Proteins. 2018 Oct;86(10):1064-1074
pubmed: 30020551
Bioinformatics. 2015 May 15;31(10):1521-5
pubmed: 25586513
Nucleic Acids Res. 2015 Jan;43(Database issue):D382-6
pubmed: 25348407
BMC Struct Biol. 2012 Oct 18;12:27
pubmed: 23078280
BMC Bioinformatics. 2014 Mar 26;15:85
pubmed: 24669753

Auteurs

Linus Scheibenreif (L)

Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. scheibenreif@rostlab.org.

Maria Littmann (M)

Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. littmann@rostlab.org.

Christine Orengo (C)

Department of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.

Burkhard Rost (B)

Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany.
TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany.
Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY 10032, USA.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Adenosine Triphosphate Adenosine Diphosphate Mitochondrial ADP, ATP Translocases Binding Sites Mitochondria

Conservation of the cooling agent binding pocket within the TRPM subfamily.

Kate Huffer, Matthew C S Denley, Elisabeth V Oskoui et al.
1.00
TRPM Cation Channels Animals Binding Sites Mice Pyrimidinones
Fucosyltransferases Drug Repositioning Molecular Docking Simulation Molecular Dynamics Simulation Humans

Classifications MeSH