Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
11 Oct 2024
11 Oct 2024
Historique:
received:
14
03
2024
accepted:
02
10
2024
medline:
12
10
2024
pubmed:
12
10
2024
entrez:
11
10
2024
Statut:
epublish
Résumé
Cryo-electron microscopy (cryo-EM) technique is widely used for protein structure determination. Current automatic cryo-EM protein complex modeling methods mostly rely on prior chain separation. However, chain separation without sequence guidance often suffers from errors caused by cross-chain interaction or noise densities, which would accumulate and mislead the subsequent steps. Here, we present EModelX, a fully automated cryo-EM protein complex structure modeling method, which achieves sequence-guiding modeling through cross-modal alignments between cryo-EM maps and protein sequences. EModelX first employs multi-task deep learning to predict Cα atoms, backbone atoms, and amino acid types from cryo-EM maps, which is subsequently used to sample Cα traces with amino acid profiles. The profiles are then aligned with protein sequences to obtain initial structural models, which yielded an average RMSD of 1.17 Å in our test set, approaching atomic-level precision in recovering PDB-deposited structures. After filling unmodeled gaps through sequence-guiding Cα threading, the final models achieved an average TM-score of 0.808, outperforming the state-of-the-art method. The further combination with AlphaFold can improve the average TM-score to 0.911. Analyzes conducted by comparing some EModelX-built models and PDB structures highlight its potential to improve PDB structures. EModelX is accessible at https://bio-web1.nscc-gz.cn/app/EModelX .
Identifiants
pubmed: 39394203
doi: 10.1038/s41467-024-53116-5
pii: 10.1038/s41467-024-53116-5
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
8808Subventions
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : T2394502
Informations de copyright
© 2024. The Author(s).
Références
Kong, R. et al. Antibody lineages with vaccine-induced antigen-binding hotspots develop broad hiv neutralization. Cell 178, 567–584 (2019).
pubmed: 31348886
pmcid: 6755680
doi: 10.1016/j.cell.2019.06.030
Bianchi, M. et al. Electron-microscopy-based epitope mapping defines specificities of polyclonal antibodies elicited during hiv-1 bg505 envelope trimer immunization. Immunity 49, 288–300 (2018).
pubmed: 30097292
pmcid: 6104742
doi: 10.1016/j.immuni.2018.07.009
Mannar, D. et al. Sars-cov-2 omicron variant: antibody evasion and cryo-em structure of spike protein–ace2 complex. Science 375, 760–764 (2022).
pubmed: 35050643
pmcid: 9799367
doi: 10.1126/science.abn7760
Merk, A. et al. Breaking cryo-em resolution barriers to facilitate drug discovery. Cell 165, 1698–1707 (2016).
pubmed: 27238019
pmcid: 4931924
doi: 10.1016/j.cell.2016.05.040
Renaud, J.P. et al. Cryo-em in drug discovery: achievements, limitations and prospects. Nat. Rev. Drug Discov. 17, 471–492 (2018).
pubmed: 29880918
doi: 10.1038/nrd.2018.77
Shimada, I., Ueda, T., Kofuku, Y., Eddy, M.T. & Wüthrich, K. Gpcr drug discovery: integrating solution nmr data with crystal and cryo-em structures. Nat. Rev. Drug Discov. 18, 59–82 (2019).
pubmed: 30410121
doi: 10.1038/nrd.2018.180
Cheng, Y. Single-particle cryo-em at crystallographic resolution. Cell 161, 450–457 (2015).
pubmed: 25910205
pmcid: 4409662
doi: 10.1016/j.cell.2015.03.049
Fernandez-Leiro, R. & Scheres, S.H. Unravelling biological macromolecules with cryo-electron microscopy. Nature 537, 339–346 (2016).
pubmed: 27629640
pmcid: 5074357
doi: 10.1038/nature19948
Nakane, T. et al. Single-particle cryo-em at atomic resolution. Nature 587, 152–156 (2020).
pubmed: 33087931
pmcid: 7611073
doi: 10.1038/s41586-020-2829-0
Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of coot. Acta Crystallogr. Sect. D: Biol. Crystallogr. 66, 486–501 (2010).
doi: 10.1107/S0907444910007493
Pettersen, E.F. et al. Ucsf chimerax: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
pubmed: 32881101
doi: 10.1002/pro.3943
Murshudov, G.N. et al. Refmac5 for the refinement of macromolecular crystal structures. Acta Crystallogr. Sect. D: Biol. Crystallogr. 67, 355–367 (2011).
doi: 10.1107/S0907444911001314
Croll, T.I. Isolde: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. Sect. D: Struct. Biol. 74, 519–530 (2018).
doi: 10.1107/S2059798318002425
Liebschner, D. et al. Macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix. Acta Crystallogr. Sect. D: Struct. Biol. 75, 861–877 (2019).
doi: 10.1107/S2059798319011471
Emdb statistics ( https://www.ebi.ac.uk/emdb/emstats ) (2023).
Esquivel-Rodríguez, J. & Kihara, D. Fitting multimeric protein complexes into electron microscopy maps using 3d zernike descriptors. J. Phys. Chem. B 116, 6854–6861 (2012).
pubmed: 22417139
pmcid: 3376205
doi: 10.1021/jp212612t
Singharoy, A. et al. Molecular dynamics-based refinement and validation for sub-5 å cryo-electron microscopy maps. Elife 5, e16105 (2016).
pubmed: 27383269
pmcid: 4990421
doi: 10.7554/eLife.16105
Tjioe, E., Lasker, K., Webb, B., Wolfson, H.J. & Sali, A. Multifit: a web server for fitting multiple protein structures into their electron microscopy density map. Nucleic acids Res. 39, W167–W170 (2011).
pubmed: 21715383
pmcid: 3125811
doi: 10.1093/nar/gkr490
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
pubmed: 34265844
pmcid: 8371605
doi: 10.1038/s41586-021-03819-2
Chen, S., Zhang, S., Li, X., Liu, Y., Yang, Y. Segem: a fast and accurate automated protein backbone structure modeling method for cryo-em in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). (IEEE), pp. 24–31 (2021).
He, J., Lin, P., Chen, J., Cao, H. & Huang, S.Y. Model building of protein complexes from intermediate-resolution cryo-em maps with deep learning-guided automatic assembly. Nat. Commun. 13, 4066 (2022).
pubmed: 35831370
pmcid: 9279371
doi: 10.1038/s41467-022-31748-9
Terwilliger, T.C. et al. Improved alphafold modeling with implicit experimental information. Nat. methods 19, 1376–1382 (2022).
pubmed: 36266465
pmcid: 9636017
doi: 10.1038/s41592-022-01645-6
Terashi, G., Wang, X., Prasad, D., Nakamura, T. & Kihara, D. Deepmainmast: integrated protocol of protein structure modeling for cryo-em with deep learning and structure prediction. Nat. Methods 21, 122–131 (2024).
pubmed: 38066344
doi: 10.1038/s41592-023-02099-0
Zhang, X., Zhang, B., Freddolino, P.L. & Zhang, Y. Cr-i-tasser: assemble protein structures from cryo-em density maps using deep convolutional neural networks. Nat. Methods 19, 195–204 (2022).
pubmed: 35132244
pmcid: 8852347
doi: 10.1038/s41592-021-01389-9
Emdb resolution statistics ( https://www.ebi.ac.uk/emdb/statistics/emdb_resolution_year ) (2023).
Chen, M., Baldwin, P.R., Ludtke, S.J. & Baker, M.L. De novo modeling in cryo-em density maps with pathwalking. J. Struct. Biol. 196, 289–298 (2016).
pubmed: 27436409
pmcid: 5118137
doi: 10.1016/j.jsb.2016.06.004
Terashi, G. & Kihara, D. De novo main-chain modeling for em maps using mainmast. Nat. Commun. 9, 1–11 (2018).
doi: 10.1038/s41467-018-04053-7
Frenz, B., Walls, A.C., Egelman, E.H., Veesler, D. & DiMaio, F. Rosettaes: a sampling strategy enabling automated interpretation of difficult cryo-em maps. Nat. methods 14, 797–800 (2017).
pubmed: 28628127
pmcid: 6009829
doi: 10.1038/nmeth.4340
He, J. & Huang, S.Y. Full-length de novo protein structure determination from cryo-em maps using deep learning. Bioinformatics 37, 3480–3490 (2021).
pubmed: 33978686
doi: 10.1093/bioinformatics/btab357
Terwilliger, T.C., Adams, P.D., Afonine, P.V. & Sobolev, O.V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nat. methods 15, 905–908 (2018).
pubmed: 30377346
pmcid: 6214191
doi: 10.1038/s41592-018-0173-1
Pfab, J., Phan, N.M. & Si, D. Deeptracer for fast de novo cryo-em protein structure modeling and special studies on cov-related complexes. Proc. Natl Acad. Sci. 118, e2017525118 (2021).
pubmed: 33361332
doi: 10.1073/pnas.2017525118
Castrejon, L., Aytar, Y., Vondrick, C., Pirsiavash, H. & Torralba, A. Learning aligned cross-modal representations from weakly aligned data in Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2940–2949 (2016).
Chung, Y.A., Weng, W.H., Tong, S & Glass, J. Unsupervised cross-modal alignment of speech and text embedding spaces. Advances in neural information processing systems31 (2018).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation in International Conference on Medical image computing and computer-assisted intervention. (Springer), pp. 234–241 (2015).
Jamali, K. et al. Automated model building and protein identification in cryo-em maps.Nature 628, 450-457 (2024).
The 2021 cryo-em assisted protein structure modeling tianchi ai challenge ( https://tianchi.aliyun.com/competition/entrance/531916/introduction ) (2021).
Rotkiewicz, P. & Skolnick, J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 1460–1465 (2008).
pubmed: 18196502
pmcid: 2692024
doi: 10.1002/jcc.20906
Afonine, P.V. et al. Real-space refinement in phenix for cryo-em and crystallography. Acta Crystallogr. Sect. D: Struct. Biol. 74, 531–544 (2018).
doi: 10.1107/S2059798318006551
Mukherjee, S. & Zhang, Y. Mm-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic acids Res. 37, e83–e83 (2009).
pubmed: 19443443
pmcid: 2699532
doi: 10.1093/nar/gkp318
Jamali, K., Kimanius, D., Scheres, S.H. A graph neural network approach to automated model building in cryo-em maps in The Eleventh International Conference on Learning Representations. (2022).
Frazier, M.N. et al. Characterization of sars2 nsp15 nuclease activity reveals it’s mad about u. Nucleic acids Res. 49, 10136–10149 (2021).
pubmed: 34403466
pmcid: 8385992
doi: 10.1093/nar/gkab719
Tarasova, E., Dhindwal, S., Popp, M., Hussain, S. & Khayat, R. Mechanism of dna interaction and translocation by the replicase of a circular rep-encoding single-stranded dna virus. MBio 12, 10–1128 (2021).
doi: 10.1128/mBio.00763-21
Afonine, P.V. et al. New tools for the analysis and validation of cryo-em maps and atomic models. Acta Crystallogr. Sect. D: Struct. Biol. 74, 814–840 (2018).
doi: 10.1107/S2059798318009324
Jiang, J.S. & Brünger, A.T. Protein hydration observed by x-ray diffraction: solvation properties of penicillopepsin and neuraminidase crystal structures. J. Mol. Biol. 243, 100–115 (1994).
pubmed: 7932732
doi: 10.1006/jmbi.1994.1633
Zhang, H. et al. Structure of human glycosylphosphatidylinositol transamidase. Nat. Struct. Mol. Biol. 29, 203–209 (2022).
pubmed: 35165458
doi: 10.1038/s41594-022-00726-6
Liu, B. et al. Bacteriophage twort protein gp168 is a β-clamp inhibitor by occupying the dna sliding channel. Nucleic acids Res. 49, 11367–11378 (2021).
pubmed: 34614154
pmcid: 8565349
doi: 10.1093/nar/gkab875
Gupta, M. et al. Cryoem and ai reveal a structure of sars-cov-2 nsp2, a multifunctional protein involved in key host processes. Research square (2021).
Kawamoto, A. et al. Native flagellar ms ring is formed by 34 subunits with 23-fold and 11-fold subsymmetries. Nature communications 12, 4223 (2021).
Satorras, V.G., Hoogeboom, E. & Welling, M.E. (n) equivariant graph neural networks in International conference on machine learning. (PMLR), pp. 9323–9332 (2021).
Lawson, C.L. et al. Emdatabank unified data resource for 3dem. Nucleic acids Res. 44, D396–D403 (2016).
pubmed: 26578576
doi: 10.1093/nar/gkv1126
Berman, H.M. et al. The protein data bank. Nucleic acids Res. 28, 235–242 (2000).
pubmed: 10592235
pmcid: 102472
doi: 10.1093/nar/28.1.235
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
pubmed: 23060610
pmcid: 3516142
doi: 10.1093/bioinformatics/bts565
Zhang, Y. & Skolnick, J. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic acids Res. 33, 2302–2309 (2005).
pubmed: 15849316
pmcid: 1084323
doi: 10.1093/nar/gki524
Lee, K., Zung, J., Li, P., Jain, V. & Seung, H.S. Superhuman accuracy on the snemi3d connectomics challenge. arXiv preprint arXiv:1706.00120 (2017).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition in Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016).
Dumoulin, V, Visin, F A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016).
Clevert, D.A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).
Wu, Y. & He, K. Group normalization in Proceedings of the European conference on computer vision (ECCV). pp. 3–19 (2018).
Ester, M. et al. A density-based algorithm for discovering clusters in large spatial databases with noise in kdd. Vol. 96, pp. 226–231 (1996).
Chakraborty, S., Venkatramani, R., Rao, B.J., Asgeirsson, B. & Dandekar, A.M. Protein structure quality assessment based on the distance profiles of consecutive backbone cα atoms. F1000Research 2 (2013).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
Kingma, D.P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Pettersen, E.F. et al. Ucsf chimera-a visualization system for exploratory research and analysis. J. computational Chem. 25, 1605–1612 (2004).
doi: 10.1002/jcc.20084