CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function.

binding free energy graph-based signatures protein–carbohydrate complex structure-based features

Journal

Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837

Informations de publication

Date de publication:
17 01 2022
Historique:
received: 23 07 2021
revised: 06 11 2021
accepted: 08 11 2021
pubmed: 10 12 2021
medline: 8 4 2022
entrez: 9 12 2021
Statut: ppublish

Résumé

Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.

Identifiants

pubmed: 34882232
pii: 6457169
doi: 10.1093/bib/bbab512
pmc: PMC8769910
pii:
doi:

Substances chimiques

Carbohydrates 0
Ligands 0
Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/M026302/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 093167/Z/10/Z
Pays : United Kingdom

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

J Mol Biol. 1999 Mar 5;286(4):1161-77
pubmed: 10047489
Nucleic Acids Res. 2016 Jul 8;44(W1):W469-73
pubmed: 27216816
Drug Discov Today. 2010 Aug;15(15-16):596-609
pubmed: 20594934
BMC Genomics. 2011 Dec 22;12 Suppl 4:S12
pubmed: 22369665
Nat Protoc. 2007;2(10):2529-37
pubmed: 17947995
J Comput Aided Mol Des. 1997 Sep;11(5):425-45
pubmed: 9385547
Molecules. 2015 May 19;20(5):9029-53
pubmed: 25996210
Chem Rev. 2016 Feb 24;116(4):1673-92
pubmed: 26509280
Bioinformatics. 2020 Jun 1;36(11):3615-3617
pubmed: 32119071
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33313775
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W314-9
pubmed: 24829462
Methods Enzymol. 2003;362:312-29
pubmed: 12968373
Mol Inform. 2015 Feb;34(2-3):115-26
pubmed: 27490034
Nucleic Acids Res. 2020 Jul 2;48(W1):W125-W131
pubmed: 32432715
J Phys Chem B. 2021 Jun 16;:
pubmed: 34133179
Nucleic Acids Res. 2018 Jul 2;46(W1):W350-W355
pubmed: 29718330
Bioinformatics. 2020 Aug 15;36(14):4200-4202
pubmed: 32399551
Nucleic Acids Res. 2020 Jul 2;48(W1):W147-W153
pubmed: 32469063
Anal Bioanal Chem. 2012 Apr;402(10):3161-76
pubmed: 22200920
Biophys J. 2001 Sep;81(3):1373-88
pubmed: 11509352
J Med Chem. 2006 Oct 19;49(21):6177-96
pubmed: 17034125
J Chem Inf Model. 2008 Aug;48(8):1616-25
pubmed: 18646839
Protein Sci. 2021 Jan;30(1):60-69
pubmed: 32881105
Cell Host Microbe. 2019 Sep 11;26(3):385-399.e9
pubmed: 31513773
Nucleic Acids Res. 2014 Apr;42(7):e55
pubmed: 24476917
Biochemistry. 2015 Oct 27;54(42):6435-8
pubmed: 26451738
J Chem Inf Model. 2011 Nov 28;51(11):2897-903
pubmed: 22017367
Nucleic Acids Res. 2017 Jul 3;45(W1):W241-W246
pubmed: 28383703
Wiley Interdiscip Rev Comput Mol Sci. 2015 Nov-Dec;5(6):405-424
pubmed: 27110292
Nucleic Acids Res. 2019 Jul 2;47(W1):W338-W344
pubmed: 31114883
J Mol Graph Model. 2011 Apr;29(6):888-93
pubmed: 21310640
J Comput Chem. 2010 Jan 30;31(2):455-61
pubmed: 19499576
Bioinformatics. 2020 Mar 1;36(5):1453-1459
pubmed: 31665262
Biomolecules. 2018 Mar 14;8(1):
pubmed: 29538331
Biochim Biophys Acta. 2002 Sep 19;1572(2-3):198-208
pubmed: 12223270
J Chem Inf Model. 2011 Oct 24;51(10):2731-45
pubmed: 21863864
ACS Cent Sci. 2018 Sep 26;4(9):1266-1273
pubmed: 30276262
Nucleic Acids Res. 2016 Jul 8;44(W1):W557-61
pubmed: 27151202
Nucleic Acids Res. 2021 Jul 2;49(W1):W417-W424
pubmed: 33893812
J Am Chem Soc. 2015 Dec 9;137(48):15152-60
pubmed: 26561965
Curr Opin Struct Biol. 2002 Oct;12(5):616-23
pubmed: 12464313
Nucleic Acids Res. 2020 Jan 8;48(D1):D368-D375
pubmed: 31598690
Drug Discov Today Technol. 2019 Dec;32-33:81-87
pubmed: 33386098
Molecules. 2016 Nov 23;21(11):
pubmed: 27886114
Sci Rep. 2016 Jul 07;6:29575
pubmed: 27384129
Bioinformatics. 2010 May 1;26(9):1169-75
pubmed: 20236947
Adv Exp Med Biol. 2001;491:431-43
pubmed: 14533813
J Mol Biol. 2017 Feb 3;429(3):365-371
pubmed: 27964945
J Cheminform. 2015 Jun 22;7:26
pubmed: 26101548
Bioinformatics. 2014 Feb 01;30(3):335-42
pubmed: 24281696
Bioinformatics. 2019 Apr 15;35(8):1334-1341
pubmed: 30202917
Interdiscip Sci. 2019 Jun;11(2):320-328
pubmed: 30877639
J Chem Inf Model. 2014 Mar 24;54(3):944-55
pubmed: 24528282

Auteurs

Thanh Binh Nguyen (TB)

Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.
School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.

Douglas E V Pires (DEV)

Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.
School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.

David B Ascher (DB)

Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.
School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.
Department of Biochemistry, University of Cambridge, Cambridge, UK.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH