Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
11 Oct 2024
Historique:
received: 14 03 2024
accepted: 02 10 2024
medline: 12 10 2024
pubmed: 12 10 2024
entrez: 11 10 2024
Statut: epublish

Résumé

Cryo-electron microscopy (cryo-EM) technique is widely used for protein structure determination. Current automatic cryo-EM protein complex modeling methods mostly rely on prior chain separation. However, chain separation without sequence guidance often suffers from errors caused by cross-chain interaction or noise densities, which would accumulate and mislead the subsequent steps. Here, we present EModelX, a fully automated cryo-EM protein complex structure modeling method, which achieves sequence-guiding modeling through cross-modal alignments between cryo-EM maps and protein sequences. EModelX first employs multi-task deep learning to predict Cα atoms, backbone atoms, and amino acid types from cryo-EM maps, which is subsequently used to sample Cα traces with amino acid profiles. The profiles are then aligned with protein sequences to obtain initial structural models, which yielded an average RMSD of 1.17 Å in our test set, approaching atomic-level precision in recovering PDB-deposited structures. After filling unmodeled gaps through sequence-guiding Cα threading, the final models achieved an average TM-score of 0.808, outperforming the state-of-the-art method. The further combination with AlphaFold can improve the average TM-score to 0.911. Analyzes conducted by comparing some EModelX-built models and PDB structures highlight its potential to improve PDB structures. EModelX is accessible at https://bio-web1.nscc-gz.cn/app/EModelX .

Identifiants

pubmed: 39394203
doi: 10.1038/s41467-024-53116-5
pii: 10.1038/s41467-024-53116-5
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

8808

Subventions

Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : T2394502

Informations de copyright

© 2024. The Author(s).

Références

Kong, R. et al. Antibody lineages with vaccine-induced antigen-binding hotspots develop broad hiv neutralization. Cell 178, 567–584 (2019).
pubmed: 31348886 pmcid: 6755680 doi: 10.1016/j.cell.2019.06.030
Bianchi, M. et al. Electron-microscopy-based epitope mapping defines specificities of polyclonal antibodies elicited during hiv-1 bg505 envelope trimer immunization. Immunity 49, 288–300 (2018).
pubmed: 30097292 pmcid: 6104742 doi: 10.1016/j.immuni.2018.07.009
Mannar, D. et al. Sars-cov-2 omicron variant: antibody evasion and cryo-em structure of spike protein–ace2 complex. Science 375, 760–764 (2022).
pubmed: 35050643 pmcid: 9799367 doi: 10.1126/science.abn7760
Merk, A. et al. Breaking cryo-em resolution barriers to facilitate drug discovery. Cell 165, 1698–1707 (2016).
pubmed: 27238019 pmcid: 4931924 doi: 10.1016/j.cell.2016.05.040
Renaud, J.P. et al. Cryo-em in drug discovery: achievements, limitations and prospects. Nat. Rev. Drug Discov. 17, 471–492 (2018).
pubmed: 29880918 doi: 10.1038/nrd.2018.77
Shimada, I., Ueda, T., Kofuku, Y., Eddy, M.T. & Wüthrich, K. Gpcr drug discovery: integrating solution nmr data with crystal and cryo-em structures. Nat. Rev. Drug Discov. 18, 59–82 (2019).
pubmed: 30410121 doi: 10.1038/nrd.2018.180
Cheng, Y. Single-particle cryo-em at crystallographic resolution. Cell 161, 450–457 (2015).
pubmed: 25910205 pmcid: 4409662 doi: 10.1016/j.cell.2015.03.049
Fernandez-Leiro, R. & Scheres, S.H. Unravelling biological macromolecules with cryo-electron microscopy. Nature 537, 339–346 (2016).
pubmed: 27629640 pmcid: 5074357 doi: 10.1038/nature19948
Nakane, T. et al. Single-particle cryo-em at atomic resolution. Nature 587, 152–156 (2020).
pubmed: 33087931 pmcid: 7611073 doi: 10.1038/s41586-020-2829-0
Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of coot. Acta Crystallogr. Sect. D: Biol. Crystallogr. 66, 486–501 (2010).
doi: 10.1107/S0907444910007493
Pettersen, E.F. et al. Ucsf chimerax: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
pubmed: 32881101 doi: 10.1002/pro.3943
Murshudov, G.N. et al. Refmac5 for the refinement of macromolecular crystal structures. Acta Crystallogr. Sect. D: Biol. Crystallogr. 67, 355–367 (2011).
doi: 10.1107/S0907444911001314
Croll, T.I. Isolde: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. Sect. D: Struct. Biol. 74, 519–530 (2018).
doi: 10.1107/S2059798318002425
Liebschner, D. et al. Macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix. Acta Crystallogr. Sect. D: Struct. Biol. 75, 861–877 (2019).
doi: 10.1107/S2059798319011471
Emdb statistics ( https://www.ebi.ac.uk/emdb/emstats ) (2023).
Esquivel-Rodríguez, J. & Kihara, D. Fitting multimeric protein complexes into electron microscopy maps using 3d zernike descriptors. J. Phys. Chem. B 116, 6854–6861 (2012).
pubmed: 22417139 pmcid: 3376205 doi: 10.1021/jp212612t
Singharoy, A. et al. Molecular dynamics-based refinement and validation for sub-5 å cryo-electron microscopy maps. Elife 5, e16105 (2016).
pubmed: 27383269 pmcid: 4990421 doi: 10.7554/eLife.16105
Tjioe, E., Lasker, K., Webb, B., Wolfson, H.J. & Sali, A. Multifit: a web server for fitting multiple protein structures into their electron microscopy density map. Nucleic acids Res. 39, W167–W170 (2011).
pubmed: 21715383 pmcid: 3125811 doi: 10.1093/nar/gkr490
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
pubmed: 34265844 pmcid: 8371605 doi: 10.1038/s41586-021-03819-2
Chen, S., Zhang, S., Li, X., Liu, Y., Yang, Y. Segem: a fast and accurate automated protein backbone structure modeling method for cryo-em in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). (IEEE), pp. 24–31 (2021).
He, J., Lin, P., Chen, J., Cao, H. & Huang, S.Y. Model building of protein complexes from intermediate-resolution cryo-em maps with deep learning-guided automatic assembly. Nat. Commun. 13, 4066 (2022).
pubmed: 35831370 pmcid: 9279371 doi: 10.1038/s41467-022-31748-9
Terwilliger, T.C. et al. Improved alphafold modeling with implicit experimental information. Nat. methods 19, 1376–1382 (2022).
pubmed: 36266465 pmcid: 9636017 doi: 10.1038/s41592-022-01645-6
Terashi, G., Wang, X., Prasad, D., Nakamura, T. & Kihara, D. Deepmainmast: integrated protocol of protein structure modeling for cryo-em with deep learning and structure prediction. Nat. Methods 21, 122–131 (2024).
pubmed: 38066344 doi: 10.1038/s41592-023-02099-0
Zhang, X., Zhang, B., Freddolino, P.L. & Zhang, Y. Cr-i-tasser: assemble protein structures from cryo-em density maps using deep convolutional neural networks. Nat. Methods 19, 195–204 (2022).
pubmed: 35132244 pmcid: 8852347 doi: 10.1038/s41592-021-01389-9
Emdb resolution statistics ( https://www.ebi.ac.uk/emdb/statistics/emdb_resolution_year ) (2023).
Chen, M., Baldwin, P.R., Ludtke, S.J. & Baker, M.L. De novo modeling in cryo-em density maps with pathwalking. J. Struct. Biol. 196, 289–298 (2016).
pubmed: 27436409 pmcid: 5118137 doi: 10.1016/j.jsb.2016.06.004
Terashi, G. & Kihara, D. De novo main-chain modeling for em maps using mainmast. Nat. Commun. 9, 1–11 (2018).
doi: 10.1038/s41467-018-04053-7
Frenz, B., Walls, A.C., Egelman, E.H., Veesler, D. & DiMaio, F. Rosettaes: a sampling strategy enabling automated interpretation of difficult cryo-em maps. Nat. methods 14, 797–800 (2017).
pubmed: 28628127 pmcid: 6009829 doi: 10.1038/nmeth.4340
He, J. & Huang, S.Y. Full-length de novo protein structure determination from cryo-em maps using deep learning. Bioinformatics 37, 3480–3490 (2021).
pubmed: 33978686 doi: 10.1093/bioinformatics/btab357
Terwilliger, T.C., Adams, P.D., Afonine, P.V. & Sobolev, O.V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nat. methods 15, 905–908 (2018).
pubmed: 30377346 pmcid: 6214191 doi: 10.1038/s41592-018-0173-1
Pfab, J., Phan, N.M. & Si, D. Deeptracer for fast de novo cryo-em protein structure modeling and special studies on cov-related complexes. Proc. Natl Acad. Sci. 118, e2017525118 (2021).
pubmed: 33361332 doi: 10.1073/pnas.2017525118
Castrejon, L., Aytar, Y., Vondrick, C., Pirsiavash, H. & Torralba, A. Learning aligned cross-modal representations from weakly aligned data in Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2940–2949 (2016).
Chung, Y.A., Weng, W.H., Tong, S & Glass, J. Unsupervised cross-modal alignment of speech and text embedding spaces. Advances in neural information processing systems31 (2018).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation in International Conference on Medical image computing and computer-assisted intervention. (Springer), pp. 234–241 (2015).
Jamali, K. et al. Automated model building and protein identification in cryo-em maps.Nature 628, 450-457 (2024).
The 2021 cryo-em assisted protein structure modeling tianchi ai challenge ( https://tianchi.aliyun.com/competition/entrance/531916/introduction ) (2021).
Rotkiewicz, P. & Skolnick, J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 1460–1465 (2008).
pubmed: 18196502 pmcid: 2692024 doi: 10.1002/jcc.20906
Afonine, P.V. et al. Real-space refinement in phenix for cryo-em and crystallography. Acta Crystallogr. Sect. D: Struct. Biol. 74, 531–544 (2018).
doi: 10.1107/S2059798318006551
Mukherjee, S. & Zhang, Y. Mm-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic acids Res. 37, e83–e83 (2009).
pubmed: 19443443 pmcid: 2699532 doi: 10.1093/nar/gkp318
Jamali, K., Kimanius, D., Scheres, S.H. A graph neural network approach to automated model building in cryo-em maps in The Eleventh International Conference on Learning Representations. (2022).
Frazier, M.N. et al. Characterization of sars2 nsp15 nuclease activity reveals it’s mad about u. Nucleic acids Res. 49, 10136–10149 (2021).
pubmed: 34403466 pmcid: 8385992 doi: 10.1093/nar/gkab719
Tarasova, E., Dhindwal, S., Popp, M., Hussain, S. & Khayat, R. Mechanism of dna interaction and translocation by the replicase of a circular rep-encoding single-stranded dna virus. MBio 12, 10–1128 (2021).
doi: 10.1128/mBio.00763-21
Afonine, P.V. et al. New tools for the analysis and validation of cryo-em maps and atomic models. Acta Crystallogr. Sect. D: Struct. Biol. 74, 814–840 (2018).
doi: 10.1107/S2059798318009324
Jiang, J.S. & Brünger, A.T. Protein hydration observed by x-ray diffraction: solvation properties of penicillopepsin and neuraminidase crystal structures. J. Mol. Biol. 243, 100–115 (1994).
pubmed: 7932732 doi: 10.1006/jmbi.1994.1633
Zhang, H. et al. Structure of human glycosylphosphatidylinositol transamidase. Nat. Struct. Mol. Biol. 29, 203–209 (2022).
pubmed: 35165458 doi: 10.1038/s41594-022-00726-6
Liu, B. et al. Bacteriophage twort protein gp168 is a β-clamp inhibitor by occupying the dna sliding channel. Nucleic acids Res. 49, 11367–11378 (2021).
pubmed: 34614154 pmcid: 8565349 doi: 10.1093/nar/gkab875
Gupta, M. et al. Cryoem and ai reveal a structure of sars-cov-2 nsp2, a multifunctional protein involved in key host processes. Research square (2021).
Kawamoto, A. et al. Native flagellar ms ring is formed by 34 subunits with 23-fold and 11-fold subsymmetries. Nature communications 12, 4223 (2021).
Satorras, V.G., Hoogeboom, E. & Welling, M.E. (n) equivariant graph neural networks in International conference on machine learning. (PMLR), pp. 9323–9332 (2021).
Lawson, C.L. et al. Emdatabank unified data resource for 3dem. Nucleic acids Res. 44, D396–D403 (2016).
pubmed: 26578576 doi: 10.1093/nar/gkv1126
Berman, H.M. et al. The protein data bank. Nucleic acids Res. 28, 235–242 (2000).
pubmed: 10592235 pmcid: 102472 doi: 10.1093/nar/28.1.235
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
pubmed: 23060610 pmcid: 3516142 doi: 10.1093/bioinformatics/bts565
Zhang, Y. & Skolnick, J. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic acids Res. 33, 2302–2309 (2005).
pubmed: 15849316 pmcid: 1084323 doi: 10.1093/nar/gki524
Lee, K., Zung, J., Li, P., Jain, V. & Seung, H.S. Superhuman accuracy on the snemi3d connectomics challenge. arXiv preprint arXiv:1706.00120 (2017).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition in Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016).
Dumoulin, V, Visin, F A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016).
Clevert, D.A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).
Wu, Y. & He, K. Group normalization in Proceedings of the European conference on computer vision (ECCV). pp. 3–19 (2018).
Ester, M. et al. A density-based algorithm for discovering clusters in large spatial databases with noise in kdd. Vol. 96, pp. 226–231 (1996).
Chakraborty, S., Venkatramani, R., Rao, B.J., Asgeirsson, B. & Dandekar, A.M. Protein structure quality assessment based on the distance profiles of consecutive backbone cα atoms. F1000Research 2 (2013).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
Kingma, D.P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Pettersen, E.F. et al. Ucsf chimera-a visualization system for exploratory research and analysis. J. computational Chem. 25, 1605–1612 (2004).
doi: 10.1002/jcc.20084

Auteurs

Sheng Chen (S)

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.

Sen Zhang (S)

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.

XiaoYu Fang (X)

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.

Liang Lin (L)

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.

Huiying Zhao (H)

Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China.

Yuedong Yang (Y)

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China. yangyd25@mail.sysu.edu.cn.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Photosynthesis Ribulose-Bisphosphate Carboxylase Carbon Dioxide Molecular Dynamics Simulation Cyanobacteria
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Classifications MeSH