CATH functional families predict functional sites in proteins.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
23 05 2021
23 05 2021
Historique:
received:
05
05
2020
revised:
30
09
2020
accepted:
27
10
2020
pubmed:
3
11
2020
medline:
9
6
2021
entrez:
2
11
2020
Statut:
ppublish
Résumé
Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. https://github.com/UCL/cath-funsite-predictor. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 33135053
pii: 5949022
doi: 10.1093/bioinformatics/btaa937
pmc: PMC8150129
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1099-1106Subventions
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/P023940/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 203780/Z/16/A
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/S020144/1
Pays : United Kingdom
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Structure. 2018 Apr 3;26(4):528-530
pubmed: 29617649
J Comput Aided Mol Des. 2013 Jun;27(6):551-67
pubmed: 23838840
PLoS Comput Biol. 2016 Jun 22;12(6):e1004926
pubmed: 27332861
J Mol Biol. 2015 Aug 28;427(17):2886-98
pubmed: 26173036
Sci Rep. 2016 Sep 26;6:34044
pubmed: 27665935
Bioinformatics. 2016 Sep 15;32(18):2889
pubmed: 27477482
Proteins. 2005 Oct 1;61(1):21-35
pubmed: 16080151
Nucleic Acids Res. 2019 Jan 8;47(D1):D280-D284
pubmed: 30398663
Protein Sci. 2007 Feb;16(2):216-26
pubmed: 17189479
Protein Sci. 1997 Nov;6(11):2308-23
pubmed: 9385633
PLoS One. 2015 Oct 30;10(10):e0140965
pubmed: 26517868
Biochim Biophys Acta. 2013 May;1834(5):874-89
pubmed: 23499848
Biochemistry. 2004 Jan 13;43(1):224-9
pubmed: 14705949
Nucleic Acids Res. 2006 Aug 07;34(13):3698-707
pubmed: 16893954
J Mol Biol. 2002 Nov 15;324(1):105-21
pubmed: 12421562
Nucleic Acids Res. 2016 Jul 8;44(W1):W344-50
pubmed: 27166375
Methods Mol Biol. 2012;819:29-42
pubmed: 22183528
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W469-73
pubmed: 20513649
J Mol Biol. 2004 Apr 2;337(4):1053-68
pubmed: 15033369
Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103
pubmed: 23087378
Nucleic Acids Res. 2019 Jan 25;47(2):582-593
pubmed: 30535108
Bioinformatics. 2008 Jul 1;24(13):1473-80
pubmed: 18450811
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Nucleic Acids Res. 2012 Jan;40(Database issue):D834-40
pubmed: 22102591
J Mol Biol. 2003 Feb 28;326(4):1289-302
pubmed: 12589769
Bioinformatics. 2007 Dec 15;23(24):3386-7
pubmed: 17895276
Bioinformatics. 2014 May 1;30(9):1236-40
pubmed: 24451626
Genome Biol. 2019 Nov 19;20(1):244
pubmed: 31744546
J Mol Biol. 2016 Jan 29;428(2 Pt A):253-267
pubmed: 26585402
Genome Inform. 2009 Oct;23(1):205-11
pubmed: 20180275
PLoS Comput Biol. 2010 Feb 05;6(2):e1000668
pubmed: 20140189
Nat Commun. 2021 May 26;12(1):3168
pubmed: 34039967
Structure. 2018 Apr 3;26(4):565-571.e3
pubmed: 29551288
Biomed Res Int. 2014;2014:807839
pubmed: 25295274
PLoS Comput Biol. 2007 Aug;3(8):e164
pubmed: 17722975
Protein Sci. 2004 Jan;13(1):190-202
pubmed: 14691234
J Mol Biol. 2004 Apr 16;338(1):181-99
pubmed: 15050833
Bioinformatics. 2008 Oct 15;24(20):2329-38
pubmed: 18710875
J Mol Biol. 1996 Mar 29;257(2):342-58
pubmed: 8609628
BMC Bioinformatics. 2017 Dec 22;18(1):583
pubmed: 29273005
Nat Rev Genet. 2013 Apr;14(4):249-61
pubmed: 23458856
Bioinformatics. 2010 Mar 1;26(5):617-24
pubmed: 20080507
J Biol Chem. 2014 Oct 31;289(44):30221-30228
pubmed: 25210038
Algorithms Mol Biol. 2015 Feb 15;10:7
pubmed: 25713596
Bioorg Med Chem. 2005 Sep 1;13(17):5013-20
pubmed: 15993087
Proteins. 2002 Aug 1;48(2):227-41
pubmed: 12112692
FEBS Lett. 2015 Nov 30;589(23):3516-26
pubmed: 26460190
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W529-33
pubmed: 20478830
Genome Biol. 2016 Sep 07;17(1):184
pubmed: 27604469
Brief Bioinform. 2009 Jul;10(4):378-91
pubmed: 19324930
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W314-21
pubmed: 23766289
Structure. 2007 Jan;15(1):85-99
pubmed: 17223535
Bioinformatics. 2019 May 15;35(10):1766-1767
pubmed: 30295745
J Mol Biol. 1997 Sep 12;272(1):121-32
pubmed: 9299342
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
Biochemistry. 2012 May 8;51(18):3933-40
pubmed: 22510088
Nat Methods. 2013 Mar;10(3):221-7
pubmed: 23353650
Nucleic Acids Res. 2018 Jan 4;46(D1):D618-D623
pubmed: 29106569
PLoS Comput Biol. 2009 Dec;5(12):e1000585
pubmed: 19997483