MAHOMES II: A webserver for predicting if a metal binding site is enzymatic.
enzymes
machine learning
metalloenzymes
metalloproteins
Journal
Protein science : a publication of the Protein Society
ISSN: 1469-896X
Titre abrégé: Protein Sci
Pays: United States
ID NLM: 9211750
Informations de publication
Date de publication:
04 2023
04 2023
Historique:
revised:
08
03
2023
received:
30
12
2022
accepted:
10
03
2023
pmc-release:
01
04
2024
medline:
3
4
2023
pubmed:
15
3
2023
entrez:
14
3
2023
Statut:
ppublish
Résumé
Recent advances have enabled high-quality computationally generated structures for proteins with no solved crystal structures. However, protein function data remains largely limited to experimental methods and homology mapping. Since structure determines function, it is natural that methods capable of using computationally generated structures for functional annotations need to be advanced. Our laboratory recently developed a method to distinguish between metalloenzyme and nonenzyme sites. Here we report improvements to this method by upgrading our physicochemical features to alleviate the need for structures with sub-angstrom precision and using machine learning to reduce training data labeling error. Our improved classifier identifies protein bound metal sites as enzymatic or nonenzymatic with 94% precision and 92% recall. We demonstrate that both adjustments increased predictive performance and reliability on sites with sub-angstrom variations. We constructed a set of predicted metalloprotein structures with no solved crystal structures and no detectable homology to our training data. Our model had an accuracy of 90%-97.5% depending on the quality of the predicted structures included in our test. Finally, we found the physicochemical trends that drove this model's successful performance were local protein density, second shell ionizable residue burial, and the pocket's accessibility to the site. We anticipate that our model's ability to correctly identify catalytic metal sites could enable identification of new enzymatic mechanisms and improve de novo metalloenzyme design success rates.
Identifiants
pubmed: 36916762
doi: 10.1002/pro.4626
pmc: PMC10044107
doi:
Substances chimiques
Metalloproteins
0
Metals
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
e4626Subventions
Organisme : NIGMS NIH HHS
ID : DP2 GM128201
Pays : United States
Organisme : NIGMS NIH HHS
ID : P20 GM103418
Pays : United States
Organisme : NIGMS NIH HHS
ID : DP2GM128201
Pays : United States
Organisme : NIGMS NIH HHS
ID : P20GM103418
Pays : United States
Commentaires et corrections
Type : UpdateOf
Informations de copyright
© 2023 The Protein Society.
Références
J Mol Biol. 2002 Nov 15;324(1):105-21
pubmed: 12421562
J Biol Chem. 2020 Jan 10;295(2):314-324
pubmed: 31796628
Nucleic Acids Res. 2009 Jan;37(Database issue):D593-7
pubmed: 18776214
J Mol Biol. 2004 Jul 2;340(2):263-76
pubmed: 15201051
J Mol Biol. 1982 May 5;157(1):105-32
pubmed: 7108955
J Am Chem Soc. 2003 Mar 12;125(10):3168-80
pubmed: 12617685
Protein Sci. 2014 Jan;23(1):47-55
pubmed: 24265211
Nucleic Acids Res. 2018 Jan 4;46(D1):D618-D623
pubmed: 29106569
J Chem Phys. 2021 May 21;154(19):195101
pubmed: 34240918
Nat Med. 2021 Oct;27(10):1666-1669
pubmed: 34642488
Biochemistry. 1998 Nov 10;37(45):15865-76
pubmed: 9843392
Nature. 2021 Aug;596(7873):590-596
pubmed: 34293799
Chem Rev. 2018 Jan 24;118(2):801-838
pubmed: 28876904
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51
pubmed: 1180967
Proteins. 2005 May 1;59(2):183-95
pubmed: 15739204
J Mol Biol. 2016 Jan 29;428(2 Pt A):253-267
pubmed: 26585402
Methods. 2016 Jan 15;93:51-63
pubmed: 26564235
Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474
pubmed: 30357411
Nat Commun. 2021 Jun 17;12(1):3712
pubmed: 34140507
Protein Sci. 2021 Aug;30(8):1617-1627
pubmed: 33938058
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Proc Natl Acad Sci U S A. 2001 Oct 23;98(22):12473-8
pubmed: 11606719
Science. 2023 Jan 13;379(6628):195-201
pubmed: 36634164
Nat Methods. 2020 Mar;17(3):261-272
pubmed: 32015543
Bioinformatics. 2023 Jan 1;39(1):
pubmed: 36484697
Bioinformatics. 2012 Aug 1;28(15):2078-9
pubmed: 22661648
Front Genet. 2019 Jan 22;9:714
pubmed: 30723495
J Chem Inf Model. 2019 Sep 23;59(9):3946-3954
pubmed: 31469957
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
Bioinformatics. 2012 Jun 15;28(12):1658-60
pubmed: 22556364
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Proteins. 2010 Apr;78(5):1195-211
pubmed: 19938154
Trends Biotechnol. 2005 May;23(5):231-7
pubmed: 15866000
Biochemistry. 2018 Feb 20;57(7):1063-1072
pubmed: 29341605
Nat Methods. 2019 Aug;16(8):687-694
pubmed: 31308553
Biochemistry. 2011 Nov 1;50(43):9283-95
pubmed: 21970785
Protein Sci. 2023 Apr;32(4):e4626
pubmed: 36916762
Mol Biosyst. 2014 Dec;10(12):3255-63
pubmed: 25292207
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
BMC Bioinformatics. 2012 Mar 28;13 Suppl 4:S18
pubmed: 22536964
J Mol Biol. 1984 Oct 15;179(1):125-42
pubmed: 6502707
PLoS Comput Biol. 2013;9(3):e1002951
pubmed: 23505360
J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048
pubmed: 28430426
Biophys Physicobiol. 2019 Nov 29;16:391-406
pubmed: 31984193
Biopolymers. 2011 Jun;95(6):390-400
pubmed: 21254002
J Am Chem Soc. 2020 Jun 3;142(22):9861-9871
pubmed: 32407086
Protein Sci. 2015 May;24(5):762-78
pubmed: 25627867
PLoS One. 2020 Feb 6;15(2):e0228487
pubmed: 32027716
Bioinformatics. 2012 Oct 15;28(20):2687-8
pubmed: 22923291
J Theor Biol. 2018 Apr 14;443:125-137
pubmed: 29408627
Biochemistry. 2019 Feb 12;58(6):438-449
pubmed: 30507164
PLoS Comput Biol. 2009 Jan;5(1):e1000266
pubmed: 19148270