CytoPipeline and CytoPipelineGUI: a Bioconductor R package suite for building and visualizing automated pre-processing pipelines for flow cytometry data.

Automated data analysis pipeline Flow cytometry Pre-processing Quality control Visualization

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
20 Feb 2024
Historique:
received: 10 11 2023
accepted: 02 02 2024
medline: 21 2 2024
pubmed: 21 2 2024
entrez: 20 2 2024
Statut: epublish

Résumé

With the increase of the dimensionality in flow cytometry data over the past years, there is a growing need to replace or complement traditional manual analysis (i.e. iterative 2D gating) with automated data analysis pipelines. A crucial part of these pipelines consists of pre-processing and applying quality control filtering to the raw data, in order to use high quality events in the downstream analyses. This part can in turn be split into a number of elementary steps: signal compensation or unmixing, scale transformation, debris, doublets and dead cells removal, batch effect correction, etc. However, assembling and assessing the pre-processing part can be challenging for a number of reasons. First, each of the involved elementary steps can be implemented using various methods and R packages. Second, the order of the steps can have an impact on the downstream analysis results. Finally, each method typically comes with its specific, non standardized diagnostic and visualizations, making objective comparison difficult for the end user. Here, we present CytoPipeline and CytoPipelineGUI, two R packages to build, compare and assess pre-processing pipelines for flow cytometry data. To exemplify these new tools, we present the steps involved in designing a pre-processing pipeline on a real life dataset and demonstrate different visual assessment use cases. We also set up a benchmarking comparing two pre-processing pipelines differing by their quality control methods, and show how the package visualization utilities can provide crucial user insight into the obtained benchmark metrics. CytoPipeline and CytoPipelineGUI are two Bioconductor R packages that help building, visualizing and assessing pre-processing pipelines for flow cytometry data. They increase productivity during pipeline development and testing, and complement benchmarking tools, by providing user intuitive insight into benchmarking results.

Sections du résumé

BACKGROUND BACKGROUND
With the increase of the dimensionality in flow cytometry data over the past years, there is a growing need to replace or complement traditional manual analysis (i.e. iterative 2D gating) with automated data analysis pipelines. A crucial part of these pipelines consists of pre-processing and applying quality control filtering to the raw data, in order to use high quality events in the downstream analyses. This part can in turn be split into a number of elementary steps: signal compensation or unmixing, scale transformation, debris, doublets and dead cells removal, batch effect correction, etc. However, assembling and assessing the pre-processing part can be challenging for a number of reasons. First, each of the involved elementary steps can be implemented using various methods and R packages. Second, the order of the steps can have an impact on the downstream analysis results. Finally, each method typically comes with its specific, non standardized diagnostic and visualizations, making objective comparison difficult for the end user.
RESULTS RESULTS
Here, we present CytoPipeline and CytoPipelineGUI, two R packages to build, compare and assess pre-processing pipelines for flow cytometry data. To exemplify these new tools, we present the steps involved in designing a pre-processing pipeline on a real life dataset and demonstrate different visual assessment use cases. We also set up a benchmarking comparing two pre-processing pipelines differing by their quality control methods, and show how the package visualization utilities can provide crucial user insight into the obtained benchmark metrics.
CONCLUSION CONCLUSIONS
CytoPipeline and CytoPipelineGUI are two Bioconductor R packages that help building, visualizing and assessing pre-processing pipelines for flow cytometry data. They increase productivity during pipeline development and testing, and complement benchmarking tools, by providing user intuitive insight into benchmarking results.

Identifiants

pubmed: 38378440
doi: 10.1186/s12859-024-05691-z
pii: 10.1186/s12859-024-05691-z
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

80

Informations de copyright

© 2024. The Author(s).

Références

McKinnon KM. Flow cytometry: an overview. Curr Protoc Immunol. 2018;120:511–5111.
doi: 10.1002/cpim.40
Saeys Y, Van Gassen S, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat Rev Immunol. 2016;16(7):449–62.
doi: 10.1038/nri.2016.56 pubmed: 27320317
Quintelier K, Couckuyt A, Emmaneel A, Aerts J, Saeys Y, Van Gassen S. Analyzing high-dimensional cytometry data using FlowSOM. Nat Protoc. 2021;16(8):3775–801.
doi: 10.1038/s41596-021-00550-0 pubmed: 34172973
Nowicka M, Krieg C, Crowell HL, Weber LM, Hartmann FJ, Guglietta S, et al. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Res. 2017;6:748.
doi: 10.12688/f1000research.11622.1 pubmed: 28663787
Rybakowska P, Van Gassen S, Quintelier K, Saeys Y, Alarcón-Riquelme ME, Marañón C. Data processing workflow for large-scale immune monitoring studies by mass cytometry. Comput Struct Biotechnol J. 2021;19:3160–75.
doi: 10.1016/j.csbj.2021.05.032 pubmed: 34141137 pmcid: 8188119
Ashhurst TM, Marsh-Wakefield F, Putri GH, Spiteri AG, Shinko D, Read MN, et al. Integration, exploration, and analysis of high-dimensional single-cell cytometry data using Spectre. Cytometry A. 2022;101(3):237–53.
doi: 10.1002/cyto.a.24350 pubmed: 33840138
Rybakowska P, Van Gassen S, Martorell Marugán J, Quintelier K, Saeys Y, Alarcón-Riquelme ME, et al. Protocol for large scale whole blood immune monitoring by mass cytometry and Cyto Quality Pipeline. STAR Protoc. 2022;3(4): 101697.
doi: 10.1016/j.xpro.2022.101697 pubmed: 36353363 pmcid: 9637821
Liechti T, Weber LM, Ashhurst TM, Stanley N, Prlic M, Van Gassen S, et al. An updated guide for the perplexed: cytometry in the high-dimensional era. Nat Immunol. 2021;22(10):1190–7.
doi: 10.1038/s41590-021-01006-z pubmed: 34489590
Mazza EMC, Brummelman J, Alvisi G, Roberto A, De Paoli F, Zanon V, et al. Background fluorescence and spreading error are major contributors of variability in high-dimensional flow cytometry data visualization by t-distributed stochastic neighboring embedding. Cytometry A. 2018;93(8):785–92.
doi: 10.1002/cyto.a.23566 pubmed: 30107099 pmcid: 6175173
Finak G, Perez JM, Weng A, Gottardo R. Optimizing transformations for automated, high throughput analysis of flow cytometry data. BMC Bioinform. 2010;11:546.
doi: 10.1186/1471-2105-11-546
Emmaneel A, Quintelier K, Sichien D, Rybakowska P, Marañón C, Alarcón-Riquelme ME, et al. PeacoQC: peak-based selection of high quality cytometry data. Cytometry A. 2022;101(4):325–38.
doi: 10.1002/cyto.a.24501 pubmed: 34549881
den Braanker H, Bongenaar M, Lubberts E. How to prepare spectral flow cytometry datasets for high dimensional data analysis: a practical workflow. Front Immunol. 2021;12: 768113.
doi: 10.3389/fimmu.2021.768113
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21.
doi: 10.1038/nmeth.3252 pubmed: 25633503 pmcid: 4509590
Monaco G, Chen H, Poidinger M, Chen J, de Magalhães JP, Larbi A. flowAI: automatic and interactive anomaly discerning tools for flow cytometry data. Bioinformatics. 2016;32(16):2473–80.
doi: 10.1093/bioinformatics/btw191 pubmed: 27153628
Fletez-Brant K, Špidlen J, Brinkman RR, Roederer M, Chattopadhyay PK. flowClean: automated identification and removal of fluorescence anomalies in flow cytometry data. Cytometry A. 2016;89(5):461–71.
doi: 10.1002/cyto.a.22837 pubmed: 26990501 pmcid: 5522377
Meskas J, Yokosawa D, Wang S, Segat GC, Brinkman RR. flowCut: an R package for automated removal of outlier events and flagging of files based on time versus fluorescence analysis. Cytometry A. 2023;103(1):71–81.
doi: 10.1002/cyto.a.24670 pubmed: 35796000
Liu X, Song W, Wong BY, Zhang T, Yu S, Lin GN, et al. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol. 2019;20(1):297.
doi: 10.1186/s13059-019-1917-7 pubmed: 31870419 pmcid: 6929440
Weber LM, Robinson MD. Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry A. 2016;89(12):1084–96.
doi: 10.1002/cyto.a.23030 pubmed: 27992111
Cheung M, Campbell JJ, Thomas RJ, Braybrook J, Petzing J. Assessment of automated flow cytometry data analysis tools within cell and gene therapy manufacturing. Int J Mol Sci. 2022;23(6):3224.
doi: 10.3390/ijms23063224 pubmed: 35328645 pmcid: 8955358
Aghaeepour N, Chattopadhyay P, Chikina M, Dhaene T, Van Gassen S, Kursa M, et al. A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes. Cytometry A. 2016;89(1):16–21.
doi: 10.1002/cyto.a.22732 pubmed: 26447924
Germain PL, Sonrel A, Robinson MD. pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools. Genome Biol. 2020;21(1):227.
doi: 10.1186/s13059-020-02136-7 pubmed: 32873325 pmcid: 7465801
Su S, Tian L, Dong X, Hickey PF, Freytag S, Ritchie ME. Cell Bench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods. Bioinformatics. 2020;36(7):2288–90.
doi: 10.1093/bioinformatics/btz889 pubmed: 31778143
Selega A, Campbell KR.: Multi-objective Bayesian optimization with heuristic objectives for biomedical and molecular data analysis workflows. Preprint at https://www.biorxiv.org/content/early/2022/06/12/2022.06.08.495370 .
Spidlen J, Moore W, Parks D, Goldberg M, Bray C, Bierre P, et al. Data file standard for flow cytometry, version FCS 3.1. Cytometry A. 2010;77(1):97–100.
doi: 10.1002/cyto.a.20825 pubmed: 19937951 pmcid: 2892967
Pezoa F, Reutter JL, Suarez F, Ugarte M, Vrgoč D. Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee; 2016. p. 263–273.
Morgan M, Wang J, Obenchain V, Lang M, Thompson R, Turaga N.: BiocParallel: Bioconductor facilities for parallel evaluation. R package version 1.34.0. Available from: https://bioconductor.org/packages/BiocParallel .
Shepherd L, Morgan M.: BiocFileCache: Manage Files Across Sessions. R package version 2.8.0. Available from: https://bioconductor.org/packages/BiocFileCache .
Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, et al.: shiny: Web Application Framework for R. Available from: https://shiny.posit.co/ .
Ellis B, Haaland P, Hahne F, Le Meur N, Gopalakrishnan N, Spidlen J, et al.: flowCore: Basic structures for flow cytometry data. R package version 2.12.0. Available from: https://bioconductor.org/packages/flowCore .
Lo K, Hahne F, Brinkman RR, Gottardo R. flowClust: a Bioconductor package for automated gating of flow cytometry data. BMC Bioinform. 2009;10:145.
doi: 10.1186/1471-2105-10-145
Parks DR, Roederer M, Moore WA. A new “Logicle’’ display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytometry A. 2006;69(6):541–51.
doi: 10.1002/cyto.a.20258 pubmed: 16604519
Finak G, Jiang W, Gottardo R. CytoML for cross-platform cytometry data sharing. Cytometry A. 2018;93(12):1189–96.
doi: 10.1002/cyto.a.23663 pubmed: 30551257 pmcid: 6443375
Hauchamps P, Gatto L.: CytoMDS: Low Dimensions projection of cytometry samples. R package version 0.99.8. Available from: https://uclouvain-cbio.github.io/CytoMDS .

Auteurs

Philippe Hauchamps (P)

Computational Biology and Bioinformatics, de duve Institute, UCLouvain, Brussels, Belgium.

Babak Bayat (B)

GSK, Rixensart, Belgium.

Simon Delandre (S)

GSK, Rixensart, Belgium.

Mehdi Hamrouni (M)

GSK, Rixensart, Belgium.

Marie Toussaint (M)

GSK, Rixensart, Belgium.

Stephane Temmerman (S)

GSK, Rixensart, Belgium.

Dan Lin (D)

GSK, Rixensart, Belgium.

Laurent Gatto (L)

Computational Biology and Bioinformatics, de duve Institute, UCLouvain, Brussels, Belgium. laurent.gatto@uclouvain.be.

Classifications MeSH