CytoPipeline and CytoPipelineGUI: a Bioconductor R package suite for building and visualizing automated pre-processing pipelines for flow cytometry data.
Automated data analysis pipeline
Flow cytometry
Pre-processing
Quality control
Visualization
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
20 Feb 2024
20 Feb 2024
Historique:
received:
10
11
2023
accepted:
02
02
2024
medline:
21
2
2024
pubmed:
21
2
2024
entrez:
20
2
2024
Statut:
epublish
Résumé
With the increase of the dimensionality in flow cytometry data over the past years, there is a growing need to replace or complement traditional manual analysis (i.e. iterative 2D gating) with automated data analysis pipelines. A crucial part of these pipelines consists of pre-processing and applying quality control filtering to the raw data, in order to use high quality events in the downstream analyses. This part can in turn be split into a number of elementary steps: signal compensation or unmixing, scale transformation, debris, doublets and dead cells removal, batch effect correction, etc. However, assembling and assessing the pre-processing part can be challenging for a number of reasons. First, each of the involved elementary steps can be implemented using various methods and R packages. Second, the order of the steps can have an impact on the downstream analysis results. Finally, each method typically comes with its specific, non standardized diagnostic and visualizations, making objective comparison difficult for the end user. Here, we present CytoPipeline and CytoPipelineGUI, two R packages to build, compare and assess pre-processing pipelines for flow cytometry data. To exemplify these new tools, we present the steps involved in designing a pre-processing pipeline on a real life dataset and demonstrate different visual assessment use cases. We also set up a benchmarking comparing two pre-processing pipelines differing by their quality control methods, and show how the package visualization utilities can provide crucial user insight into the obtained benchmark metrics. CytoPipeline and CytoPipelineGUI are two Bioconductor R packages that help building, visualizing and assessing pre-processing pipelines for flow cytometry data. They increase productivity during pipeline development and testing, and complement benchmarking tools, by providing user intuitive insight into benchmarking results.
Sections du résumé
BACKGROUND
BACKGROUND
With the increase of the dimensionality in flow cytometry data over the past years, there is a growing need to replace or complement traditional manual analysis (i.e. iterative 2D gating) with automated data analysis pipelines. A crucial part of these pipelines consists of pre-processing and applying quality control filtering to the raw data, in order to use high quality events in the downstream analyses. This part can in turn be split into a number of elementary steps: signal compensation or unmixing, scale transformation, debris, doublets and dead cells removal, batch effect correction, etc. However, assembling and assessing the pre-processing part can be challenging for a number of reasons. First, each of the involved elementary steps can be implemented using various methods and R packages. Second, the order of the steps can have an impact on the downstream analysis results. Finally, each method typically comes with its specific, non standardized diagnostic and visualizations, making objective comparison difficult for the end user.
RESULTS
RESULTS
Here, we present CytoPipeline and CytoPipelineGUI, two R packages to build, compare and assess pre-processing pipelines for flow cytometry data. To exemplify these new tools, we present the steps involved in designing a pre-processing pipeline on a real life dataset and demonstrate different visual assessment use cases. We also set up a benchmarking comparing two pre-processing pipelines differing by their quality control methods, and show how the package visualization utilities can provide crucial user insight into the obtained benchmark metrics.
CONCLUSION
CONCLUSIONS
CytoPipeline and CytoPipelineGUI are two Bioconductor R packages that help building, visualizing and assessing pre-processing pipelines for flow cytometry data. They increase productivity during pipeline development and testing, and complement benchmarking tools, by providing user intuitive insight into benchmarking results.
Identifiants
pubmed: 38378440
doi: 10.1186/s12859-024-05691-z
pii: 10.1186/s12859-024-05691-z
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
80Informations de copyright
© 2024. The Author(s).
Références
McKinnon KM. Flow cytometry: an overview. Curr Protoc Immunol. 2018;120:511–5111.
doi: 10.1002/cpim.40
Saeys Y, Van Gassen S, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat Rev Immunol. 2016;16(7):449–62.
doi: 10.1038/nri.2016.56
pubmed: 27320317
Quintelier K, Couckuyt A, Emmaneel A, Aerts J, Saeys Y, Van Gassen S. Analyzing high-dimensional cytometry data using FlowSOM. Nat Protoc. 2021;16(8):3775–801.
doi: 10.1038/s41596-021-00550-0
pubmed: 34172973
Nowicka M, Krieg C, Crowell HL, Weber LM, Hartmann FJ, Guglietta S, et al. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Res. 2017;6:748.
doi: 10.12688/f1000research.11622.1
pubmed: 28663787
Rybakowska P, Van Gassen S, Quintelier K, Saeys Y, Alarcón-Riquelme ME, Marañón C. Data processing workflow for large-scale immune monitoring studies by mass cytometry. Comput Struct Biotechnol J. 2021;19:3160–75.
doi: 10.1016/j.csbj.2021.05.032
pubmed: 34141137
pmcid: 8188119
Ashhurst TM, Marsh-Wakefield F, Putri GH, Spiteri AG, Shinko D, Read MN, et al. Integration, exploration, and analysis of high-dimensional single-cell cytometry data using Spectre. Cytometry A. 2022;101(3):237–53.
doi: 10.1002/cyto.a.24350
pubmed: 33840138
Rybakowska P, Van Gassen S, Martorell Marugán J, Quintelier K, Saeys Y, Alarcón-Riquelme ME, et al. Protocol for large scale whole blood immune monitoring by mass cytometry and Cyto Quality Pipeline. STAR Protoc. 2022;3(4): 101697.
doi: 10.1016/j.xpro.2022.101697
pubmed: 36353363
pmcid: 9637821
Liechti T, Weber LM, Ashhurst TM, Stanley N, Prlic M, Van Gassen S, et al. An updated guide for the perplexed: cytometry in the high-dimensional era. Nat Immunol. 2021;22(10):1190–7.
doi: 10.1038/s41590-021-01006-z
pubmed: 34489590
Mazza EMC, Brummelman J, Alvisi G, Roberto A, De Paoli F, Zanon V, et al. Background fluorescence and spreading error are major contributors of variability in high-dimensional flow cytometry data visualization by t-distributed stochastic neighboring embedding. Cytometry A. 2018;93(8):785–92.
doi: 10.1002/cyto.a.23566
pubmed: 30107099
pmcid: 6175173
Finak G, Perez JM, Weng A, Gottardo R. Optimizing transformations for automated, high throughput analysis of flow cytometry data. BMC Bioinform. 2010;11:546.
doi: 10.1186/1471-2105-11-546
Emmaneel A, Quintelier K, Sichien D, Rybakowska P, Marañón C, Alarcón-Riquelme ME, et al. PeacoQC: peak-based selection of high quality cytometry data. Cytometry A. 2022;101(4):325–38.
doi: 10.1002/cyto.a.24501
pubmed: 34549881
den Braanker H, Bongenaar M, Lubberts E. How to prepare spectral flow cytometry datasets for high dimensional data analysis: a practical workflow. Front Immunol. 2021;12: 768113.
doi: 10.3389/fimmu.2021.768113
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21.
doi: 10.1038/nmeth.3252
pubmed: 25633503
pmcid: 4509590
Monaco G, Chen H, Poidinger M, Chen J, de Magalhães JP, Larbi A. flowAI: automatic and interactive anomaly discerning tools for flow cytometry data. Bioinformatics. 2016;32(16):2473–80.
doi: 10.1093/bioinformatics/btw191
pubmed: 27153628
Fletez-Brant K, Špidlen J, Brinkman RR, Roederer M, Chattopadhyay PK. flowClean: automated identification and removal of fluorescence anomalies in flow cytometry data. Cytometry A. 2016;89(5):461–71.
doi: 10.1002/cyto.a.22837
pubmed: 26990501
pmcid: 5522377
Meskas J, Yokosawa D, Wang S, Segat GC, Brinkman RR. flowCut: an R package for automated removal of outlier events and flagging of files based on time versus fluorescence analysis. Cytometry A. 2023;103(1):71–81.
doi: 10.1002/cyto.a.24670
pubmed: 35796000
Liu X, Song W, Wong BY, Zhang T, Yu S, Lin GN, et al. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol. 2019;20(1):297.
doi: 10.1186/s13059-019-1917-7
pubmed: 31870419
pmcid: 6929440
Weber LM, Robinson MD. Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry A. 2016;89(12):1084–96.
doi: 10.1002/cyto.a.23030
pubmed: 27992111
Cheung M, Campbell JJ, Thomas RJ, Braybrook J, Petzing J. Assessment of automated flow cytometry data analysis tools within cell and gene therapy manufacturing. Int J Mol Sci. 2022;23(6):3224.
doi: 10.3390/ijms23063224
pubmed: 35328645
pmcid: 8955358
Aghaeepour N, Chattopadhyay P, Chikina M, Dhaene T, Van Gassen S, Kursa M, et al. A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes. Cytometry A. 2016;89(1):16–21.
doi: 10.1002/cyto.a.22732
pubmed: 26447924
Germain PL, Sonrel A, Robinson MD. pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools. Genome Biol. 2020;21(1):227.
doi: 10.1186/s13059-020-02136-7
pubmed: 32873325
pmcid: 7465801
Su S, Tian L, Dong X, Hickey PF, Freytag S, Ritchie ME. Cell Bench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods. Bioinformatics. 2020;36(7):2288–90.
doi: 10.1093/bioinformatics/btz889
pubmed: 31778143
Selega A, Campbell KR.: Multi-objective Bayesian optimization with heuristic objectives for biomedical and molecular data analysis workflows. Preprint at https://www.biorxiv.org/content/early/2022/06/12/2022.06.08.495370 .
Spidlen J, Moore W, Parks D, Goldberg M, Bray C, Bierre P, et al. Data file standard for flow cytometry, version FCS 3.1. Cytometry A. 2010;77(1):97–100.
doi: 10.1002/cyto.a.20825
pubmed: 19937951
pmcid: 2892967
Pezoa F, Reutter JL, Suarez F, Ugarte M, Vrgoč D. Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee; 2016. p. 263–273.
Morgan M, Wang J, Obenchain V, Lang M, Thompson R, Turaga N.: BiocParallel: Bioconductor facilities for parallel evaluation. R package version 1.34.0. Available from: https://bioconductor.org/packages/BiocParallel .
Shepherd L, Morgan M.: BiocFileCache: Manage Files Across Sessions. R package version 2.8.0. Available from: https://bioconductor.org/packages/BiocFileCache .
Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, et al.: shiny: Web Application Framework for R. Available from: https://shiny.posit.co/ .
Ellis B, Haaland P, Hahne F, Le Meur N, Gopalakrishnan N, Spidlen J, et al.: flowCore: Basic structures for flow cytometry data. R package version 2.12.0. Available from: https://bioconductor.org/packages/flowCore .
Lo K, Hahne F, Brinkman RR, Gottardo R. flowClust: a Bioconductor package for automated gating of flow cytometry data. BMC Bioinform. 2009;10:145.
doi: 10.1186/1471-2105-10-145
Parks DR, Roederer M, Moore WA. A new “Logicle’’ display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytometry A. 2006;69(6):541–51.
doi: 10.1002/cyto.a.20258
pubmed: 16604519
Finak G, Jiang W, Gottardo R. CytoML for cross-platform cytometry data sharing. Cytometry A. 2018;93(12):1189–96.
doi: 10.1002/cyto.a.23663
pubmed: 30551257
pmcid: 6443375
Hauchamps P, Gatto L.: CytoMDS: Low Dimensions projection of cytometry samples. R package version 0.99.8. Available from: https://uclouvain-cbio.github.io/CytoMDS .