PCprophet: a framework for protein complex prediction and differential analysis using proteomic data.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
05 2021
05 2021
Historique:
received:
06
05
2020
accepted:
03
03
2021
pubmed:
17
4
2021
medline:
28
7
2021
entrez:
16
4
2021
Statut:
ppublish
Résumé
Despite the availability of methods for analyzing protein complexes, systematic analysis of complexes under multiple conditions remains challenging. Approaches based on biochemical fractionation of intact, native complexes and correlation of protein profiles have shown promise. However, most approaches for interpreting cofractionation datasets to yield complex composition and rearrangements between samples depend considerably on protein-protein interaction inference. We introduce PCprophet, a toolkit built on size exclusion chromatography-sequential window acquisition of all theoretical mass spectrometry (SEC-SWATH-MS) data to predict protein complexes and characterize their changes across experimental conditions. We demonstrate improved performance of PCprophet over state-of-the-art approaches and introduce a Bayesian approach to analyze altered protein-protein interactions across conditions. We provide both command-line and graphical interfaces to support the application of PCprophet to any cofractionation MS dataset, independent of separation or quantitative liquid chromatography-MS workflow, for the detection and quantitative tracking of protein complexes and their physiological dynamics.
Identifiants
pubmed: 33859439
doi: 10.1038/s41592-021-01107-5
pii: 10.1038/s41592-021-01107-5
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
520-527Références
Marsh, J. A. & Teichmann, S. A. Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015).
pubmed: 25494300
doi: 10.1146/annurev-biochem-060614-034142
Pan, J. et al. Interrogation of mammalian protein complex structure, function, and membership using genome-scale fitness screens. Cell Syst. 6, 555–568 e557 (2018).
pubmed: 29778836
pmcid: 6152908
doi: 10.1016/j.cels.2018.04.011
Sowmya, G., Breen, E. J. & Ranganathan, S. Linking structural features of protein complexes and biological function. Protein Sci. 24, 1486–1494 (2015).
pubmed: 26131659
pmcid: 4570542
doi: 10.1002/pro.2736
Spirin, V. & Mirny, L. A. Protein complexes and functional modules in molecular networks. Proc. Natl Acad. Sci. USA 100, 12123–12128 (2003).
pubmed: 14517352
doi: 10.1073/pnas.2032324100
pmcid: 218723
Salas, D., Stacey, R. G., Akinlaja, M. & Foster, L. J. Next-generation interactomics: considerations for the use of co-elution to measure protein interaction networks. Mol. Cell Proteom. 19, 1–10 (2020).
doi: 10.1074/mcp.R119.001803
Crozier, T. W. M., Tinti, M., Larance, M., Lamond, A. I. & Ferguson, M. A. J. Prediction of protein complexes in Trypanosoma brucei by protein correlation profiling mass spectrometry and machine learning. Mol. Cell Proteom. 16, 2254–2267 (2017).
doi: 10.1074/mcp.O117.068122
Heusel, M. et al. A global screen for assembly state changes of the mitotic proteome by SEC-SWATH-MS. Cell Syst. 10, 133–155.e6 (2019).
doi: 10.1016/j.cels.2020.01.001
Hu, L. Z. et al. EPIC: software toolkit for elution profile-based inference of protein complexes. Nat. Methods 16, 737–742 (2019).
pubmed: 31308550
pmcid: 7995176
doi: 10.1038/s41592-019-0461-4
Kirkwood, K. J., Ahmad, Y., Larance, M. & Lamond, A. I. Characterization of native protein complexes and protein isoform variation using size-fractionation-based quantitative proteomics. Mol. Cell Proteom. 12, 3851–3873 (2013).
doi: 10.1074/mcp.M113.032367
Scott, N. E. et al. Interactome disassembly during apoptosis occurs independent of caspase cleavage. Mol. Syst. Biol. 13, 906 (2017).
pubmed: 28082348
pmcid: 5293159
doi: 10.15252/msb.20167067
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
pubmed: 30642884
pmcid: 6346213
doi: 10.15252/msb.20188438
McBride, Z. et al. A label-free mass spectrometry method to predict endogenous protein complex composition. Mol. Cell Proteom. 18, 1588–1606 (2019).
doi: 10.1074/mcp.RA119.001400
Stacey, R. G., Skinnider, M. A., Scott, N. E. & Foster, L. J. A rapid and accurate approach for prediction of interactomes from coelution data (PrInCE). BMC Bioinf. 18, 457 (2017).
doi: 10.1186/s12859-017-1865-8
Kerr, C. H. et al. Dynamic rewiring of the human interactome by interferon signaling. Genome Biol. 21, 140 (2020).
pubmed: 32539747
pmcid: 7294662
doi: 10.1186/s13059-020-02050-y
Pourhaghighi, R. et al. BraInMap elucidates the macromolecular connectivity landscape of mammalian brain. Cell Syst. 10, 333–350.e314 (2020).
pubmed: 32325033
pmcid: 7938770
doi: 10.1016/j.cels.2020.03.003
Stacey, R. G., Skinnider, M. A. & Foster, L. J. On the robustness of graph-based clustering to random network alterations. Mol. Cell Proteom. 20, 100002 (2020).
doi: 10.1074/mcp.RA120.002275
Quinlan, R. C4.5: Programs for Machine Learning (Morgan Kaufmann, 1993).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Zhang, H. The optimality of naïve Bayes. in Proc. Seventeenth International Florida Artificial Intelligence Research Society Conference (AAAI Press, 2004).
Cortes, C. & Vapnik, V. Support-Vector Networks. Mach. Learn. 20, 273–297 (1995).
doi: 10.1007/BF00994018
Lecessie, S. & Vanhouwelingen, J. C. Ridge estimators in logistic-regression. Appl Stat.-J. R. St C. 41, 191–201 (1992).
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes – 2019. Nucleic Acids Res. 47, D559–D563 (2019).
pubmed: 30357367
doi: 10.1093/nar/gky973
Kristensen, A. R., Gsponer, J. & Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat. Methods 9, 907–909 (2012).
pubmed: 22863883
pmcid: 3954081
doi: 10.1038/nmeth.2131
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
pubmed: 30476243
doi: 10.1093/nar/gky1131
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
pubmed: 28514442
pmcid: 5531611
doi: 10.1038/nature22366
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
pubmed: 26186194
pmcid: 4617211
doi: 10.1016/j.cell.2015.06.043
Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
pubmed: 30476227
doi: 10.1093/nar/gky1079
Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).
pubmed: 22939629
pmcid: 3477804
doi: 10.1016/j.cell.2012.08.011
Livneh, I., Cohen-Kaplan, V., Cohen-Rosenzweig, C., Avni, N. & Ciechanover, A. The life cycle of the 26S proteasome: from birth, through regulation and function, and onto its death. Cell Res 26, 869–885 (2016).
pubmed: 27444871
pmcid: 4973335
doi: 10.1038/cr.2016.86
Lasker, K. et al. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc. Natl Acad. Sci. USA 109, 1380–1387 (2012).
pubmed: 22307589
doi: 10.1073/pnas.1120559109
pmcid: 3277140
Ding, Z. et al. Structural snapshots of 26S proteasome reveal tetraubiquitin-induced conformations. Mol. Cell 73, 1150–1161.e1156 (2019).
pubmed: 30792173
doi: 10.1016/j.molcel.2019.01.018
Huang, D. T. et al. E2-RING expansion of the NEDD8 cascade confers specificity to cullin modification. Mol. Cell 33, 483–495 (2009).
pubmed: 19250909
pmcid: 2725360
doi: 10.1016/j.molcel.2009.01.011
Kohroki, J., Nishiyama, T., Nakamura, T. & Masuho, Y. ASB proteins interact with Cullin5 and Rbx2 to form E3 ubiquitin ligase complexes. FEBS Lett. 579, 6796–6802 (2005).
pubmed: 16325183
doi: 10.1016/j.febslet.2005.11.016
Lowe, N. et al. Analysis of the expression patterns, subcellular localisations and interaction partners of Drosophila proteins using a pigP protein trap library. Development 141, 3994–4005 (2014).
pubmed: 25294943
pmcid: 4197710
doi: 10.1242/dev.111054
Collins, M. O. et al. Molecular characterization and comparison of the components and multiprotein complexes in the postsynaptic proteome. J. Neurochem. 97, 16–23 (2006).
pubmed: 16635246
doi: 10.1111/j.1471-4159.2005.03507.x
Antonysamy, S. et al. Crystal structure of the human PRMT5:MEP50 complex. Proc. Natl Acad. Sci. USA 109, 17960–17965 (2012).
pubmed: 23071334
doi: 10.1073/pnas.1209814109
pmcid: 3497828
Scoumanne, A., Zhang, J. & Chen, X. PRMT5 is required for cell-cycle progression and p53 tumor suppressor function. Nucleic Acids Res. 37, 4965–4976 (2009).
pubmed: 19528079
pmcid: 2731901
doi: 10.1093/nar/gkp516
Gu, Z. et al. The p44/wdr77-dependent cellular proliferation process during lung development is reactivated in lung cancer. Oncogene 32, 1888–1900 (2013).
pubmed: 22665061
doi: 10.1038/onc.2012.207
Bludau, I. & Aebersold, R. Proteomic and interactomic insights into the molecular basis of cell functional diversity. Nat. Rev. Mol. Cell Biol. 21, 327–340 (2020).
pubmed: 32235894
doi: 10.1038/s41580-020-0231-2
Bludau, I. et al. Complex-centric proteome profiling by SEC-SWATH-MS for the parallel detection of hundreds of protein complexes. Nat. Protoc. 15, 2341–2386 (2020).
pubmed: 32690956
doi: 10.1038/s41596-020-0332-6
Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).
pubmed: 23051804
pmcid: 3471674
doi: 10.1038/nbt.2377
Rost, H. L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).
pubmed: 24727770
doi: 10.1038/nbt.2841
Rost, H. L. et al. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics. Nat. Methods 13, 777–783 (2016).
pubmed: 27479329
pmcid: 5008461
doi: 10.1038/nmeth.3954
Dijkstra, E. W. A note on two problems in connexion with graphs. Numer. Math. 1, 3 (1959).
doi: 10.1007/BF01386390
Vert, J. P, Tsuda, K & Schoelkopf, B. Kernel Methods in Computational Biology (MIT Press, 2004) 35–70.
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn Res 12, 2825–2830 (2011).
Frank, E., Hall, M. A., & Witten, I. H. The WEKA Workbench. Online Appendix for ‘Data Mining: Practical Machine Learning Tools and Techniques’, 4th edn (Morgan Kaufmann, 2016).
Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975).
pubmed: 1180967
doi: 10.1016/0005-2795(75)90109-9
Franz, M. et al. GeneMANIA update 2018. Nucleic Acids Res. 46, W60–W64 (2018).
pubmed: 29912392
pmcid: 6030815
doi: 10.1093/nar/gky311
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
pubmed: 10802651
pmcid: 3037419
doi: 10.1038/75556
Carbon, S. et al. AmiGO: online access to ontology and annotation data. Bioinformatics 25, 288–289 (2009).
pubmed: 19033274
doi: 10.1093/bioinformatics/btn615
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338 (2019).
doi: 10.1093/nar/gky1055
Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C. F. A new method to measure the semantic similarity of GO terms. Bioinformatics 23, 1274–1281 (2007).
pubmed: 17344234
doi: 10.1093/bioinformatics/btm087
The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
doi: 10.1093/nar/gky1049
McKinney, W. Data structure for statistical computation in Python. in The 9th Python in Science Conference (eds., Stéfan van der Walt and Jarrod Millman) 56–61 (2010).
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. in The 7th Python in Science Conference (SciPy2008) (eds., Varoquaux, G. et al.) (2008).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
doi: 10.1093/nar/gky1106
pubmed: 30395289