Standardized Workflow for Mass-Spectrometry-Based Single-Cell Proteomics Data Processing and Analysis Using the scp Package.
Bioconductor
Data processing
Mass spectrometry
Quantitative data analysis
R
Single-cell proteomics
Journal
Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969
Informations de publication
Date de publication:
2024
2024
Historique:
medline:
22
6
2024
pubmed:
22
6
2024
entrez:
21
6
2024
Statut:
ppublish
Résumé
Mass-spectrometry (MS)-based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells-proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover, it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardized framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this chapter, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalization, and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardized framework and highlight some crucial steps.
Identifiants
pubmed: 38907155
doi: 10.1007/978-1-0716-3934-4_14
doi:
Substances chimiques
Proteome
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
177-220Informations de copyright
© 2024. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.
Références
Leduc A, Huffman RG, Cantlon, J, Khan, S, Slavov, N (2022) Exploring functional protein covariation across single cells using nPOP. Genome Biol 23(1):261. https://doi.org/10.1186/s13059-022-02817-5
doi: 10.1186/s13059-022-02817-5
pubmed: 36527135
pmcid: 9756690
Derks J, Leduc A, Wallmann G, Huffman RG, Willetts M, Khan S, Specht H, Ralser M, Demichev V, Slavov, N (2023) Increasing the throughput of sensitive proteomics by plexDIA. Nat Biotechnol 41(1):50–59. https://doi.org/10.1038/s41587-022-01389-w
doi: 10.1038/s41587-022-01389-w
pubmed: 35835881
Matzinger M, Müller E, Dürnberger G, Pichler P, Mechtler K (2023) Robust and easy-to-use one-pot workflow for label-free single-cell proteomics. Anal Chem 95(9), 4435–4445. https://doi.org/10.1021/acs.analchem.2c05022
doi: 10.1021/acs.analchem.2c05022
pubmed: 36802514
pmcid: 9996606
Slavov N (2022) Learning from natural variation across the proteomes of single cells. PLoS Biol 20(1):e3001512. https://doi.org/10.1371/journal.pbio.3001512
doi: 10.1371/journal.pbio.3001512
pubmed: 34986167
pmcid: 8765665
Vanderaa C, Gatto, L. The current state of single-cell proteomics data analysis. Curr Protocol 3(1):e658. https://doi.org/10.1002/cpz1.658
Vanderaa C, Gatto, L (2021) Replication of single-cell proteomics data reveals important computational challenges. Expert Rev Proteomics 18(10):835–843. https://doi.org/10.1080/14789450.2021.1988571
doi: 10.1080/14789450.2021.1988571
pubmed: 34602016
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Oleś, AK, Pagès H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan, M (2015) Orchestrating high-throughput genomic analysis with bioconductor. Nat Methods 12(2):115–121. https://doi.org/10.1038/nmeth.3252
doi: 10.1038/nmeth.3252
pubmed: 25633503
pmcid: 4509590
Lun A, Risso D (2023) SingleCellExperiment: S4 classes for single cell data. https://doi.org/10.18129/B9.bioc.SingleCellExperiment
Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020) Orchestrating Single-Cell Analysis with Bioconductor . Nat Methods 17:137–145
doi: 10.1038/s41592-019-0654-x
pubmed: 31792435
Tian L, Dong X, Freytag S, Lê Cao K-A, Su S, JalalAbadi A, Amann-Zalcenstein D, Weber TS, Seidi A, Jabbari JS, Naik SH, Ritchie ME (2019) Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat Methods 16(6):479–487. https://doi.org/10.1038/s41592-019-0425-8
doi: 10.1038/s41592-019-0425-8
pubmed: 31133762
Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy DJ, Álvarez-Varela A, Batlle E, Sagar Grün D, Lau JK, Boutet SC, Sanada C, Ooi A, Jones RC, Kaihara K, Brampton C, Talaga Y, Sasagawa Y, Tanaka K, Hayashi T, Braeuning C, Fischer C, Sauer S, Trefzer T, Conrad C, Adiconis X, Nguyen LT, Regev A, Levin JZ, Parekh S, Janjic A, Wange LE, Bagnoli JW, Enard W, Gut M, Sandberg R, Nikaido I, Gut I, Stegle O, Heyn, H (2020) Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol 38(6):747–755
doi: 10.1038/s41587-020-0469-4
pubmed: 32518403
Wickham H, François R, Henry L, Müller K, Vaughan D (2023) Dplyr: a grammar of data manipulation
Wickham H (2016) Ggplot2: elegant graphics for data analysis . Springer, New York
doi: 10.1007/978-3-319-24277-4
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma Powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. https://doi.org/10.1093/nar/gkv007
McCarthy DJ, Campbell KR, Lun ATL, Willis QF (2017) Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-Seq Data in R. Bioinformatics 33:1179–1186. https://doi.org/10.1093/bioinformatics/btw777
doi: 10.1093/bioinformatics/btw777
pubmed: 28088763
pmcid: 5408845
Specht H, Emmott E, Petelski AA, Huffman RG, Perlman DH, Serra M, Kharchenko P, Koller A, Slavov N (2021) Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biol 22(1):50. https://doi.org/10.1186/s13059-021-02267-5
doi: 10.1186/s13059-021-02267-5
pubmed: 33504367
pmcid: 7839219
Adusumilli R, Mallick P (2017) Data conversion with ProteoWizard msConvert. In: Comai L, Katz JE, Mallick P (eds) Proteomics: methods and protocols. Methods in molecular biology. Springer, New York, pp 339–368. https://doi.org/10.1007/978-1-4939-6747-6_23
doi: 10.1007/978-1-4939-6747-6_23
Lazear MR (2023) Sage: an open-source tool for fast proteomics searching and quantification at scale. J Proteome Res 22(11):3652–3659. https://doi.org/10.1021/acs.jproteome.3c00486
doi: 10.1021/acs.jproteome.3c00486
pubmed: 37819886
Grégoire S, Vanderaa C, Pyr dit Ruys S, Kune C, Mazzucchelli G, Vertommen D, Gatto, L (2023) Data accompanying “standardised workflow for mass spectrometry-based single-cell proteomics data analysis using the scp package”. Zenodo. https://doi.org/10.5281/zenodo.8417228
Vizcaíno JA, Deutsch EW, Wang R, Csordas A, Reisinger F, Ríos D, Dianes JA, Sun Z, Farrah T, Bandeira N, Binz P-A, Xenarios I, Eisenacher M, Mayer G, Gatto L, Campos A, Chalkley RJ, Kraus H-J, Albar JP, Martinez-Bartolomé S, Apweiler R, Omenn GS, Martens L, Jones AR, Hermjakob H (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32(3):223–226
doi: 10.1038/nbt.2839
pubmed: 24727771
pmcid: 3986813
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26(12):1367–1372. https://doi.org/10.1038/nbt.1511
doi: 10.1038/nbt.1511
pubmed: 19029910
Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI (2017) MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat Methods 14(5):513–520. https://doi.org/10.1038/nmeth.4256
doi: 10.1038/nmeth.4256
pubmed: 28394336
pmcid: 5409104
Gatto L, Aebersold R, Cox J, Demichev V, Derks J, Emmott E, Franks AM, Ivanov AR, Kelly RT, Khoury L, Leduc A, MacCoss MJ, Nemes P, Perlman DH, Petelski AA, Rose CM, Schoof EM, Van Eyk J, Vanderaa C, Yates JR, Slavov N (2023) Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat Methods 20(3):375–386. https://doi.org/10.1038/s41592-023-01785-3
doi: 10.1038/s41592-023-01785-3
pubmed: 36864200
pmcid: 10130941
Kong W, Hui HWH, Peng H, Goh WWB (2022) Dealing with missing values in proteomics data. Proteomics 22(23–24):e2200092
doi: 10.1002/pmic.202200092
pubmed: 36349819
Čuklina J, Lee CH, Williams EG, Sajic T, Collins BC, Rodríguez Martínez M, Sharma VS, Wendt F, Goetze S, Keele GR, Wollscheid B, Aebersold R, Pedrioli PGA (2021) Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol 17(8):e10240. https://doi.org/10.15252/msb.202110240
doi: 10.15252/msb.202110240
pubmed: 34432947
pmcid: 8447595
O’Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF (2018) The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat 12(4):2075–2095. https://doi.org/10.1214/18-AOAS1144
doi: 10.1214/18-AOAS1144
pubmed: 30473739
pmcid: 6249692
Goeminne LJE, Sticker A, Martens L, Gevaert K, Clement L (2020) MSqRob takes the missing hurdle: uniting intensity- and count-based proteomics. Anal Chem 92(9):6278–6287. https://doi.org/10.1021/acs.analchem.9b04375
doi: 10.1021/acs.analchem.9b04375
pubmed: 32227882
Vanderaa C, Gatto L (2023) Revisiting the thorny issue of missing values in single-cell proteomics. J Proteome Res 22(9):2775–2784. https://doi.org/10.1021/acs.jproteome.3c00227
doi: 10.1021/acs.jproteome.3c00227
pubmed: 37530557
Schoof EM, Furtwängler B, Üresin N, Rapin N, Savickas S, Gentil C, Lechman E, Keller U, auf dem, Dick JE, Porse BT (2021) Quantitative single-cell proteomics as a tool to characterize cellular hierarchies. Nat Commun 12:3341. https://doi.org/10.1038/s41467-021-23667-y
Petrosius V, Aragon-Fernandez P, Üresin N, Kovacs G, Phlairaharn T, Furtwängler B, Op De Beeck J, Skovbakke SL, Goletz S, Thomsen SF, Keller U, auf dem, Natarajan KN, Porse BT, Schoof EM (2023) Exploration of cell state heterogeneity using single-cell proteomics through sensitivity-tailored data-independent acquisition. Nat Commun 14:5910. https://doi.org/10.1038/s41467-023-41602-1
Lun ATL, McCarthy DJ, Marioni, JC (2016) A step-by-step workflow for low-level analysis of single-cell RNA-Seq data with bioconductor. F1000Res 5:2122. https://doi.org/10.12688/f1000research.9501.2
Liang Y, Acor H, McCown MA, Nwosu AJ, Boekweg H, Axtell NB, Truong T, Cong Y, Payne SH, Kelly RT (2021) Fully automated sample processing and analysis workflow for low-input proteome profiling. Anal Chem 93(3):1658–1666. https://doi.org/10.1021/acs.analchem.0c04240
doi: 10.1021/acs.analchem.0c04240
pubmed: 33352054
Brunner A, Thielert M, Vasilopoulou C, Ammar C, Coscia F, Mund A, Hoerning OB, Bache N, Apalategui A, Lubeck M, Richter S, Fischer DS, Raether O, Park MA, Meier F, Theis FJ, Mann M (2022) Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. Mol Syst Biol 18(3):e10798. https://doi.org/10.15252/msb.202110798
doi: 10.15252/msb.202110798
pubmed: 35226415
pmcid: 8884154
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu xiaochong, Liu S, Bo X, Yu G (2021) clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. The Innovation 2(3):100141. https://doi.org/10.1016/j.xinn.2021.100141
Yu G, Wang L-G, Han Y, He Q-Y (2012) clusterProfiler: An r package for comparing biological themes among gene clusters. OMICS J Integr Biol 16(5):284–287. https://doi.org/10.1089/omi.2011.0118
doi: 10.1089/omi.2011.0118
Angerer P, Haghverdi L, Büttner M, Theis F, Marr C, Büttner F (2015) Destiny: diffusion maps for large-scale single-cell data in r. Bioinformatics 32(8):1243. https://doi.org/10.1093/bioinformatics/btv715
Zhu Y, Scheibinger M, Ellwanger DC, Krey JF, Choi D, Kelly RT, Heller S, Barr-Gillespie PG (2019) Single-cell proteomics reveals changes in expression during hair-cell development. eLife 8:e50777. https://doi.org/10.7554/eLife.50777
doi: 10.7554/eLife.50777
pubmed: 31682227
pmcid: 6855842
Ellwanger DC, Scheibinger M, Dumont RA, Barr-Gillespie PG, Heller S (2018) Transcriptional dynamics of hair-bundle morphogenesis revealed with CellTrails. Cell Rep 23(10):2901–2914. https://doi.org/10.1016/j.celrep.2018.05.002
Sticker A, Goeminne L, Martens L, Clement L (2020) Robust summarization and inference in proteome-wide label-free quantification. Mol Cell Proteomics 19(7):1209–1219. https://doi.org/10.1074/mcp.RA119.001624
doi: 10.1074/mcp.RA119.001624
pubmed: 32321741
pmcid: 7338080
Tu C, Li J, Sheng Q, Zhang M, Qu, J (2014) Systematic assessment of survey scan and MS2-based abundance strategies for label-free quantitative proteomics using high-resolution MS data. J Proteome Res 13(4):2069–2079. https://doi.org/10.1021/pr401206m
doi: 10.1021/pr401206m
pubmed: 24635752
pmcid: 3993956
Lazar C, Gatto L, Ferro M, Bruley C, Burger T (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J Proteome Res 15(4):1116–1125. https://doi.org/10.1021/acs.jproteome.5b00981
doi: 10.1021/acs.jproteome.5b00981
pubmed: 26906401
Rainer J, Vicini A, Salzer L, Stanstrup J, Badia JM, Neumann S, Stravs MA, Verri Hernandes V, Gatto L, Gibb S, Witting M (2022) A modular and expandable ecosystem for metabolomics data annotation in r. Metabolites 12:173. https://doi.org/10.3390/metabo12020173
doi: 10.3390/metabo12020173
pubmed: 35208247
pmcid: 8878271