CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function.
binding free energy
graph-based signatures
protein–carbohydrate complex
structure-based features
Journal
Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837
Informations de publication
Date de publication:
17 01 2022
17 01 2022
Historique:
received:
23
07
2021
revised:
06
11
2021
accepted:
08
11
2021
pubmed:
10
12
2021
medline:
8
4
2022
entrez:
9
12
2021
Statut:
ppublish
Résumé
Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.
Identifiants
pubmed: 34882232
pii: 6457169
doi: 10.1093/bib/bbab512
pmc: PMC8769910
pii:
doi:
Substances chimiques
Carbohydrates
0
Ligands
0
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/M026302/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 093167/Z/10/Z
Pays : United Kingdom
Informations de copyright
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
J Mol Biol. 1999 Mar 5;286(4):1161-77
pubmed: 10047489
Nucleic Acids Res. 2016 Jul 8;44(W1):W469-73
pubmed: 27216816
Drug Discov Today. 2010 Aug;15(15-16):596-609
pubmed: 20594934
BMC Genomics. 2011 Dec 22;12 Suppl 4:S12
pubmed: 22369665
Nat Protoc. 2007;2(10):2529-37
pubmed: 17947995
J Comput Aided Mol Des. 1997 Sep;11(5):425-45
pubmed: 9385547
Molecules. 2015 May 19;20(5):9029-53
pubmed: 25996210
Chem Rev. 2016 Feb 24;116(4):1673-92
pubmed: 26509280
Bioinformatics. 2020 Jun 1;36(11):3615-3617
pubmed: 32119071
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33313775
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W314-9
pubmed: 24829462
Methods Enzymol. 2003;362:312-29
pubmed: 12968373
Mol Inform. 2015 Feb;34(2-3):115-26
pubmed: 27490034
Nucleic Acids Res. 2020 Jul 2;48(W1):W125-W131
pubmed: 32432715
J Phys Chem B. 2021 Jun 16;:
pubmed: 34133179
Nucleic Acids Res. 2018 Jul 2;46(W1):W350-W355
pubmed: 29718330
Bioinformatics. 2020 Aug 15;36(14):4200-4202
pubmed: 32399551
Nucleic Acids Res. 2020 Jul 2;48(W1):W147-W153
pubmed: 32469063
Anal Bioanal Chem. 2012 Apr;402(10):3161-76
pubmed: 22200920
Biophys J. 2001 Sep;81(3):1373-88
pubmed: 11509352
J Med Chem. 2006 Oct 19;49(21):6177-96
pubmed: 17034125
J Chem Inf Model. 2008 Aug;48(8):1616-25
pubmed: 18646839
Protein Sci. 2021 Jan;30(1):60-69
pubmed: 32881105
Cell Host Microbe. 2019 Sep 11;26(3):385-399.e9
pubmed: 31513773
Nucleic Acids Res. 2014 Apr;42(7):e55
pubmed: 24476917
Biochemistry. 2015 Oct 27;54(42):6435-8
pubmed: 26451738
J Chem Inf Model. 2011 Nov 28;51(11):2897-903
pubmed: 22017367
Nucleic Acids Res. 2017 Jul 3;45(W1):W241-W246
pubmed: 28383703
Wiley Interdiscip Rev Comput Mol Sci. 2015 Nov-Dec;5(6):405-424
pubmed: 27110292
Nucleic Acids Res. 2019 Jul 2;47(W1):W338-W344
pubmed: 31114883
J Mol Graph Model. 2011 Apr;29(6):888-93
pubmed: 21310640
J Comput Chem. 2010 Jan 30;31(2):455-61
pubmed: 19499576
Bioinformatics. 2020 Mar 1;36(5):1453-1459
pubmed: 31665262
Biomolecules. 2018 Mar 14;8(1):
pubmed: 29538331
Biochim Biophys Acta. 2002 Sep 19;1572(2-3):198-208
pubmed: 12223270
J Chem Inf Model. 2011 Oct 24;51(10):2731-45
pubmed: 21863864
ACS Cent Sci. 2018 Sep 26;4(9):1266-1273
pubmed: 30276262
Nucleic Acids Res. 2016 Jul 8;44(W1):W557-61
pubmed: 27151202
Nucleic Acids Res. 2021 Jul 2;49(W1):W417-W424
pubmed: 33893812
J Am Chem Soc. 2015 Dec 9;137(48):15152-60
pubmed: 26561965
Curr Opin Struct Biol. 2002 Oct;12(5):616-23
pubmed: 12464313
Nucleic Acids Res. 2020 Jan 8;48(D1):D368-D375
pubmed: 31598690
Drug Discov Today Technol. 2019 Dec;32-33:81-87
pubmed: 33386098
Molecules. 2016 Nov 23;21(11):
pubmed: 27886114
Sci Rep. 2016 Jul 07;6:29575
pubmed: 27384129
Bioinformatics. 2010 May 1;26(9):1169-75
pubmed: 20236947
Adv Exp Med Biol. 2001;491:431-43
pubmed: 14533813
J Mol Biol. 2017 Feb 3;429(3):365-371
pubmed: 27964945
J Cheminform. 2015 Jun 22;7:26
pubmed: 26101548
Bioinformatics. 2014 Feb 01;30(3):335-42
pubmed: 24281696
Bioinformatics. 2019 Apr 15;35(8):1334-1341
pubmed: 30202917
Interdiscip Sci. 2019 Jun;11(2):320-328
pubmed: 30877639
J Chem Inf Model. 2014 Mar 24;54(3):944-55
pubmed: 24528282