FunFam protein families improve residue level molecular function prediction.
Binding residue prediction
CATH
Functional families
Protein binding sites
Protein families
Protein function
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
18 Jul 2019
18 Jul 2019
Historique:
received:
18
04
2019
accepted:
09
07
2019
entrez:
20
7
2019
pubmed:
20
7
2019
medline:
14
9
2019
Statut:
epublish
Résumé
The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.
Sections du résumé
BACKGROUND
BACKGROUND
The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues.
RESULTS
RESULTS
FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold.
CONCLUSIONS
CONCLUSIONS
The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.
Identifiants
pubmed: 31319797
doi: 10.1186/s12859-019-2988-x
pii: 10.1186/s12859-019-2988-x
pmc: PMC6639920
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
400Références
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W469-73
pubmed: 20513649
Bioinformatics. 2007 Nov 1;23(21):2947-8
pubmed: 17846036
Genome Biol. 2019 Nov 19;20(1):244
pubmed: 31744546
Proteins. 1997 Jul;28(3):405-20
pubmed: 9223186
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Nucleic Acids Res. 2015 Jul 1;43(W1):W30-8
pubmed: 25943547
Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9
pubmed: 23203869
Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8
pubmed: 23203873
Bioinformatics. 2018 Jul 1;34(13):i304-i312
pubmed: 29950013
Elife. 2014 Sep 25;3:
pubmed: 25255213
PLoS Comput Biol. 2017 Mar 27;13(3):e1005462
pubmed: 28346509
Structure. 1997 Aug 15;5(8):1093-108
pubmed: 9309224
J Mol Biol. 1995 Apr 7;247(4):536-40
pubmed: 7723011
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
Comp Funct Genomics. 2003;4(4):410-5
pubmed: 18629088
Proteins. 2001;Suppl 5:192-9
pubmed: 11835497
Brief Bioinform. 2002 Sep;3(3):265-74
pubmed: 12230035
PLoS Comput Biol. 2017 Jun 29;13(6):e1005522
pubmed: 28662117
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Structure. 2009 Jun 10;17(6):869-81
pubmed: 19523904
Nucleic Acids Res. 2000 Jan 1;28(1):304-5
pubmed: 10592255
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W539-44
pubmed: 19465386
Proteins. 2009;77 Suppl 9:1-4
pubmed: 19774620
Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7
pubmed: 23161676
Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360
pubmed: 30398656
Cell. 2012 Jun 22;149(7):1607-21
pubmed: 22579045
Nucleic Acids Res. 2017 Jan 4;45(D1):D289-D295
pubmed: 27899584
Proteins. 2018 Oct;86(10):1064-1074
pubmed: 30020551
Bioinformatics. 2015 May 15;31(10):1521-5
pubmed: 25586513
Nucleic Acids Res. 2015 Jan;43(Database issue):D382-6
pubmed: 25348407
BMC Struct Biol. 2012 Oct 18;12:27
pubmed: 23078280
BMC Bioinformatics. 2014 Mar 26;15:85
pubmed: 24669753