Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
02 2020
02 2020
Historique:
received:
11
04
2019
accepted:
28
10
2019
pubmed:
11
12
2019
medline:
6
5
2020
entrez:
11
12
2019
Statut:
ppublish
Résumé
Predicting interactions between proteins and other biomolecules solely based on structure remains a challenge in biology. A high-level representation of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein's modes of interactions with other biomolecules. We hypothesize that proteins participating in similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF (molecular surface interaction fingerprinting), a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein-protein interaction site prediction and ultrafast scanning of protein surfaces for prediction of protein-protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.
Identifiants
pubmed: 31819266
doi: 10.1038/s41592-019-0666-6
pii: 10.1038/s41592-019-0666-6
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
184-192Références
Donald, B. R. Algorithms in Structural Molecular Biology (MIT Press, 2011).
Zhang, Q. C. et al. Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 490, 556–560 (2012).
pubmed: 23023127
pmcid: 3482288
Hermann, J. C. et al. Structure-based activity prediction for an enzyme of unknown function. Nature 448, 775–779 (2007).
pubmed: 17603473
pmcid: 2254328
Kortemme, T. et al. Computational redesign of protein–protein interaction specificity. Nat. Struct. Mol. Biol. 11, 371–379 (2004).
pubmed: 15034550
Yang, J. et al. The I-TASSER Suite: Protein Structure and Function Prediction. Nat. Methods 12, 7–8 (2015).
pubmed: 25549265
pmcid: 4428668
Planas-Iglesias, J. et al. Understanding protein–protein interactions using local structural features. J. Mol. Biol. 425, 1210–1224 (2013).
pubmed: 23353828
Cong, Q., Anishchenko, I., Ovchinnikov, S. & Baker, D. Protein interaction networks revealed by proteome coevolution. Science 365, 185–189 (2019).
pubmed: 31296772
pmcid: 6948103
Richards, F. M. Areas, volumes, packing, and protein structure. Annu. Rev. Biophysics Bioeng. 6, 151–176 (2003).
Bronstein, M.M., Bruna, J., Lecun, Y., Szlam, A. & Vandergheynst, P. Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Processing Magazine 34, https://doi.org/10.1109/MSP.2017.2693418 (2017).
Shulman-Peleg, A., Nussinov, R. & Wolfson, H. J. Recognition of functional sites in protein structures. J. Mol. Biol. 339, 607–633 (2004).
pubmed: 15147845
Duhovny, D., Nussinov, R. & Wolfson, H.J. Efficient unbound docking of Rigid molecules. in Proc. International Workshop on Algorithms in Bioinformatics (eds., Guigó, R. and Gusfield, D.) 2452, 185–200 (Springer, 2002); https://doi.org/10.1007/3-540-45784-4_14
Sharp, K. Electrostatic interactions in macromolecules: theory and applications. Annu. Rev. Biophys. Biomol. Struct. 19, 301–332 (1990).
Daberdaku, S. & Ferrari, C. Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics 35, 1870–1876 (2019).
pubmed: 30395191
Kihara, D., Sael, L., Chikhi, R. & Esquivel-Rodriguez, J. Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. Curr. Protein Pept. Sci. 12, 520–530 (2011).
pubmed: 21787306
Zhu, X., Xiong, Y. & Kihara, D. Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0. Bioinformatics 31, 707–713 (2015).
pubmed: 25359888
Venkatraman, V., Yang, Y. D., Sael, L. & Kihara, D. Protein–protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics 10, 407 (2009).
pubmed: 20003235
pmcid: 2800122
Yin, S., Proctor, E. A., Lugovskoy, A. A. & Dokholyan, N. V. Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl Acad. Sci. USA 106, 16622–16626 (2009).
pubmed: 19805347
Krizhevsky, A., Sutskever, I. & Hinton, G. Imagenet classification with deep convolutional neural networks. in Advances in Neural Information Processing Systems 1097–1105 (eds., F. Pereira, C.J.C. Burges, L. Bottou and K.Q. Weinberger) Curran Associates, Inc. (2012).
Monti, F. et al. Geometric deep learning on graphs and manifolds using mixture model CNNs. in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 5425–5434 (eds., R. Chellappa, Z. Zhang, and A. Hoogs) (2017).
Masci, J., Boscaini, D., Bronstein, M. M. & Vandergheynst, P. Geodesic convolutional neural networks on Riemannian manifolds. In Proc. IEEE International Conference on Computer Vision 832–840 (eds., R. Bajcsy, G. Hager, and Y. Ma) (2015).
Sanner, M. F., Olson, A. J. & Spehner, J. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 38, 305–320 (1996).
pubmed: 8906967
Koenderink, J. J. & van Doorn, A. J. Surface shape and curvature scales. Image Vis. Comput. 10, 557–564 (1992).
Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982).
pubmed: 7108955
Jurrus, E. et al. Improvements to the APBS biomolecular solvation software suite. Protein Sci. 27, 112–128 (2018).
pubmed: 28836357
Kortemme, T., Morozov, A. V. & Baker, D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes. J. Mol. Biol. 326, 1239–1259 (2003).
pubmed: 12589766
Chubukov, V., Gerosa, L., Kochanowski, K. & Sauer, U. Coordination of microbial metabolism. Nat. Rev. Microbiol. 12, 327–340 (2014).
pubmed: 24658329
Konc, J. et al. ProBiS-CHARMMing: web interface for prediction and optimization of ligands in protein binding sites. J. Chem. Inf. Modeling 55, 2308–2314 (2015).
Ritschel, T., Schirris, T. J. & Russel, F. G. KRIPO—a structure-based pharmacophores approach explains polypharmacological effects. J. Cheminform. 6(Suppl 1): O26. https://doi.org/10.1186/1758-2946-6-S1-O26 (2014).
Ehrt, C., Brinkjost, T. & Koch, O. A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets(ProSPECCTs). PLoS Comput. Biol. 14(11), e1006483 (2018).
pubmed: 30408032
pmcid: 6224041
Ha, J. Y. et al. Crystal structure of d-erythronate-4-phosphate dehydrogenase complexed with NAD. J. Mol. Biol. 366, 1294–1304 (2007).
pubmed: 17217963
Gauss, G. H., Kleven, M. D., Sendamarai, A. K., Fleming, M. D. & Lawrence, C. M. The crystal structure of six-transmembrane epithelial antigen of the prostate 4 (Steap4), a ferri/cuprireductase, suggests a novel interdomain flavin-binding site. J. Biol. Chem. 288, 20668–20682 (2013).
pubmed: 23733181
pmcid: 3711330
Jones, S. & Thornton, J. M. Prediction of protein–protein interaction sites using patch analysis. J. Mol. Biol. 272, 133–143 (1997).
pubmed: 9299343
Porollo, A. & Meller, J. Prediction-based fingerprints of protein–protein interactions. Proteins 66, 630–645 (2007).
pubmed: 17152079
Northey, T. C., BarešiÄ, A. & Martin, A. C. R. IntPred: a structure-based predictor of protein–protein interaction sites. Bioinformatics 34, 223–229 (2018).
pubmed: 28968673
Xue, L. C., Dobbs, D., Bonvin, A. M. J. J. & Honavar, V. Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 589, 3516–3526 (2015).
pubmed: 26460190
pmcid: 4655202
Murakami, Y. & Mizuguchi, K. Applying the naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26, 1841–1848 (2010).
pubmed: 20529890
Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011).
pubmed: 21566186
pmcid: 3164876
King, N. P. et al. Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science 336, 1171–1174 (2012).
pubmed: 22654060
pmcid: 4138882
Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014).
pubmed: 24499818
pmcid: 4260937
Muja, M. & Lowe, D. G. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2227–2240 (2014).
pubmed: 26353063
Greisen, P. J. et al. Computational design of environmental sensors for the potent opioid fentanyl. eLife 6, 1–23 (2017).
Chopra, S., Hadsell, R. & LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. in Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1, 539–546 (eds., M. Hebert and D. Kriegman) IEEE (2005).
Pierce, B. G., Hourai, Y. & Weng, Z. Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS ONE 6, e24657 (2011).
pubmed: 21949741
pmcid: 3176283
Lensink, M. F., Velankar, S. & Wodak, S. J. Modeling protein–protein and protein–peptide complexes: CAPRI 6th edition. Proteins 85, 359–377 (2017).
pubmed: 27865038
Pierce, B. & Weng, Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins 72, 270–279 (2008).
pubmed: 18214977
pmcid: 2696687
Zak, K. M. et al. Structure of the complex of human programmed death 1, PD-1, and its ligand PD-L1. Structure 23, 2341–2348 (2015).
pubmed: 26602187
pmcid: 4752817
Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Hallen, M. A. et al. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. J. Computational Chem. 39, 2494–2507 (2018).
Leaver-Fay, A. et al. in Methods in Enzymology (eds Johnson, M. J. & Brand, L.) 545–574 (Elsevier, 2010); https://doi.org/10.1016/b978-0-12-381270-4.00019-6
Word, J. M., Lovell, S. C., Richardson, J. S. & Richardson, D. C. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J. Mol. Biol. 285, 1735–1747 (1999).
pubmed: 9917408
Zhou, Q. PyMesh—Geometry Processing Library for Python. Software available for download at https://github.com/PyMesh/PyMesh (2019).
Dolinsky, T. J. et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 35 (suppl. 2), W522–W525 (2007).
pubmed: 17488841
pmcid: 1933214
Baker, N. A., Sept, D., Joseph, S., Holst, M. J. & McCammon, J. A. Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl Acad. Sci. USA 98, 10037–10041 (2001).
pubmed: 11517324
O’Connell, A. A., Borg, I. & Groenen, P. Modern multidimensional scaling: theory and applications. J. Am. Stat. Assoc. 94, 338–339 (2006).
Bonet Martínez, J. Exploiting Protein Fragments in Protein Modelling and Function Prediction (Univ. Pompeu Fabra, 2015).
Baspinar, A. et al. PRISM: a web server and repository for prediction of protein–protein interactions and modeling their 3D complexes. Nucleic Acids Res. 42, W285–W289 (2014).
pubmed: 24829450
pmcid: 4086120
Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).
pubmed: 25301850
Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2013).
pubmed: 24214988
pmcid: 3965125
Vreven, T. et al. Updates to the integrated protein–protein interaction benchmarks: docking Benchmark version 5 and Affinity Benchmark version 2. J. Mol. Biol. 427, 3031–3041 (2015).
pubmed: 26231283
pmcid: 4677049
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
pubmed: 23060610
pmcid: 3516142
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
pubmed: 15849316
pmcid: 1084323
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Presented at International Conference on Learning Representations (ICLR) https://arxiv.org/abs/1412.6980 (2015).
Svoboda, J., Masci, J. & Bronstein, M. M. Palmprint recognition via discriminative index learning. In Proc. International Conference on Pattern Recognition 4232–4237 (eds. P. Gomez, S. Velastin) (2017); https://doi.org/10.1109/ICPR.2016.7900298
Zhou, Q.-Y., Park, J. & Koltun, V. Open3D: a modern library for 3D data processing. Technical report, available at: https://arxiv.org/abs/1801.09847 (2018).
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 265–283 (eds., K. Keeton, T. Roscoe) (2016).
Pablo Gainza & Freyr S. LPDI-EPFL/masif: MaSIF Paper Software Release (Zenodo, 2019); https://doi.org/10.5281/zenodo.3519996
The PyMOL Molecular Graphics System v.1.8 (Schrödinger LLC, 2015).