FunFam protein families improve residue level molecular function prediction.

Binding Sites Databases, Protein Protein Binding Protein Domains Proteins / chemistry

Binding residue prediction CATH Functional families Protein binding sites Protein families Protein function

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
18 Jul 2019

Historique:

received: 18 04 2019

accepted: 09 07 2019

entrez: 20 7 2019

pubmed: 20 7 2019

medline: 14 9 2019

Statut: epublish

Résumé

The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold.

CONCLUSIONS CONCLUSIONS

The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Identifiants

DOI: 10.1186/s12859-019-2988-x PMID: 31319797 PMC: PMC6639920

pubmed: 31319797

doi: 10.1186/s12859-019-2988-x

pii: 10.1186/s12859-019-2988-x

pmc: PMC6639920

doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

400

Références

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W469-73

pubmed: 20513649

Bioinformatics. 2007 Nov 1;23(21):2947-8

pubmed: 17846036

Genome Biol. 2019 Nov 19;20(1):244

pubmed: 31744546

Proteins. 1997 Jul;28(3):405-20

pubmed: 9223186

Nucleic Acids Res. 2000 Jan 1;28(1):235-42

pubmed: 10592235

Nucleic Acids Res. 2015 Jul 1;43(W1):W30-8

pubmed: 25943547

Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9

pubmed: 23203869

Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8

pubmed: 23203873

Bioinformatics. 2018 Jul 1;34(13):i304-i312

pubmed: 29950013

Elife. 2014 Sep 25;3:

pubmed: 25255213

PLoS Comput Biol. 2017 Mar 27;13(3):e1005462

pubmed: 28346509

Structure. 1997 Aug 15;5(8):1093-108

pubmed: 9309224

J Mol Biol. 1995 Apr 7;247(4):536-40

pubmed: 7723011

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432

pubmed: 30357350

Comp Funct Genomics. 2003;4(4):410-5

pubmed: 18629088

Proteins. 2001;Suppl 5:192-9

pubmed: 11835497

Brief Bioinform. 2002 Sep;3(3):265-74

pubmed: 12230035

PLoS Comput Biol. 2017 Jun 29;13(6):e1005522

pubmed: 28662117

Nat Genet. 2000 May;25(1):25-9

pubmed: 10802651

Structure. 2009 Jun 10;17(6):869-81

pubmed: 19523904

Nucleic Acids Res. 2000 Jan 1;28(1):304-5

pubmed: 10592255

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W539-44

pubmed: 19465386

Proteins. 2009;77 Suppl 9:1-4

pubmed: 19774620

Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7

pubmed: 23161676

Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360

pubmed: 30398656

Cell. 2012 Jun 22;149(7):1607-21

pubmed: 22579045

Nucleic Acids Res. 2017 Jan 4;45(D1):D289-D295

pubmed: 27899584

Proteins. 2018 Oct;86(10):1064-1074

pubmed: 30020551

Bioinformatics. 2015 May 15;31(10):1521-5

pubmed: 25586513

Nucleic Acids Res. 2015 Jan;43(Database issue):D382-6

pubmed: 25348407

BMC Struct Biol. 2012 Oct 18;12:27

pubmed: 23078280

BMC Bioinformatics. 2014 Mar 26;15:85

pubmed: 24669753

FunFam protein families improve residue level molecular function prediction.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Linus Scheibenreif (L)

Maria Littmann (M)

Christine Orengo (C)

Burkhard Rost (B)

Articles similaires

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Membrane potential stimulates ADP import and ATP export by the mitochondrial ADP/ATP carrier due to its positively charged binding site.

Conservation of the cooling agent binding pocket within the TRPM subfamily.

Drug repurposing against fucosyltransferase-2 via docking, STD-NMR, and molecular dynamic simulation studies.

Classifications MeSH