Improving replicability in single-cell RNA-Seq cell type discovery with Dune.

Single-Cell Analysis / methods Software RNA-Seq / methods Cluster Analysis Algorithms Sequence Analysis, RNA / methods Humans Transcriptome / genetics Reproducibility of Results Gene Expression Profiling / methods Single-Cell Gene Expression Analysis

Clustering Consensus clustering Replicability ScRNA-Seq Single-cell

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
24 May 2024

Historique:

received: 14 09 2023

accepted: 17 05 2024

medline: 25 5 2024

pubmed: 25 5 2024

entrez: 24 5 2024

Statut: epublish

Résumé

Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html .

CONCLUSIONS CONCLUSIONS

Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.

Identifiants

DOI: 10.1186/s12859-024-05814-6 PMID: 38789920

pubmed: 38789920

doi: 10.1186/s12859-024-05814-6

pii: 10.1186/s12859-024-05814-6

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

198

Subventions

Organisme : Fonds Wetenschappelijk Onderzoek

ID : 1246220N

Organisme : NIH HHS

ID : U19MH114821

Pays : United States

Organisme : NIH HHS

ID : U19MH114830

Pays : United States

Informations de copyright

Références

Svensson V, da Veiga Beltrame E. A curated database reveals trends in single cell transcriptomics. bioRxiv; 2019. pp. 742304. https://doi.org/10.1101/742304 .

Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR, Hemberg M. SC3: consensus clustering of single-cell RNA-seq data. Nat Methods. 2017;14(5):483–6. https://doi.org/10.1038/nmeth.4236 .

doi: 10.1038/nmeth.4236 pubmed: 28346451 pmcid: 5410170

Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888. https://doi.org/10.1016/j.cell.2019.05.031 .

doi: 10.1016/j.cell.2019.05.031 pubmed: 31178118 pmcid: 6687398

Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, Shendure J. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502. https://doi.org/10.1038/s41586-019-0969-x .

doi: 10.1038/s41586-019-0969-x pubmed: 30787437 pmcid: 6434952

Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Research. 2018;7:377–82. https://doi.org/10.5256/f1000research.17093.r36544 .

doi: 10.5256/f1000research.17093.r36544

Kiselev VY, Andrews TS, Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data; 2019. http://www.nature.com/articles/s41576-018-0088-9 .

Ranjan B, Schmidt F, Sun W, Park J, Honardoost MA, Tan J, Arul RN, Prabhakar S. ScConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data. BMC Bioinform. 2021;22(1):186. https://doi.org/10.1186/s12859-021-04028-4 .

doi: 10.1186/s12859-021-04028-4

Risso D, Purvis L, Fletcher RB, Das D, Ngai J, Dudoit S, Purdom E. ClusterExperiment and RSEC: a bioconductor package and framework for clustering of single-cell and other large gene expression datasets. PLoS Comput Biol. 2018;14(9):e1006378. https://doi.org/10.1371/journal.pcbi.1006378 .

doi: 10.1371/journal.pcbi.1006378 pubmed: 30180157 pmcid: 6138422

Tasic B, Yao Z, Graybuck LT, Smith KA, Nguyen TN, Bertagnolli D, Goldy J, Garren E, Economo MN, Viswanathan S, Penn O, Bakken T, Menon V, Miller J, Fong O, Hirokawa KE, Lathia K, Rimorin C, Tieu M, Larsen R, Casper T, Barkan E, Kroll M, Parry S, Shapovalova NV, Hirschstein D, Pendergraft J, Sullivan HA, Kim TK, Szafer A, Dee N, Groblewski P, Wickersham I, Cetin A, Harris JA, Levi BP, Sunkin SM, Madisen L, Daigle TL, Looger L, Bernard A, Phillips J, Lein E, Hawrylycz M, Svoboda K, Jones AR, Koch C, Zeng H. Shared and distinct transcriptomic cell types across neocortical areas. Nature. 2018;563(7729):72–8. https://doi.org/10.1038/s41586-018-0654-5 .

doi: 10.1038/s41586-018-0654-5 pubmed: 30382198 pmcid: 6456269

Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x genomics single-cell RNA-sequencing data. F1000Research. 2018;8:9. https://doi.org/10.12688/f1000research.15809.1 .

doi: 10.12688/f1000research.15809.1

Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):1–9. https://doi.org/10.1093/gigascience/giy083 .

doi: 10.1093/gigascience/giy083

Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017;18(1):174. https://doi.org/10.1186/s13059-017-1305-0 .

doi: 10.1186/s13059-017-1305-0 pubmed: 28899397 pmcid: 5596896

Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846–50. https://doi.org/10.1080/01621459.1971.10482356 .

doi: 10.1080/01621459.1971.10482356

Lawrence H, Phipps A. Comparing partitions. J Classif. 1985;2(1):193–218. https://doi.org/10.1007/BF01908075 .

doi: 10.1007/BF01908075

Yao Z, Liu H, Xie F, Fischer S, Adkins RS, Aldrige AI, Ament SA, Ann Bartlett M, Behrens M, Van den Berge K, Bertagnolli D, Tommaso Biancalani A, Booeshaghi S, Bravo HC, Casper T, Colantuoni C, Crabtree J, Creasy H, Crichton K, Crow M, Dee N, Dougherty EL, Doyle WI, Dudoit S, Fang R, Felix V, Fong O, Giglio M, Goldy J, Hawrylycz M, Roux H, de Bezieux BR, Herb RH, Hou X, Qiwen H, Josh Huang Z, Kancherla J, Kroll M, Lathia K, Li YE, Lucero JD, Luo C, Mahurkar A, McMillen D, Nadaf NM, Nery JR, Nguyen TN, Niu S-Y, Ntranos V, Orvis J, Osteen JK, Pham T, Pinto-Duarte A, Poirion O, Preissl S, Purdom E, Rimorin C, Risso D, Rivkin AC, Smith K, Street K, Sulc J, Svensson V, Tieu M, Torkelson A, Tung H, Vaishnav ED, Vanderburg CR, van Velthoven C, Wang X, White O, Gillis J, Kharchenko PV, Ngai J, Pachter L, Regev A, Tasic B, Welch JD, Ecker JR, Macosko E, Ren B, BRAIN Initiative Cell Census Network (BICCN), Hongkui Z, Eran AM. An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types. bioRxiv. 2020. https://doi.org/10.1101/2020.02.29.970558 .

Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, Ryu JH, Wagner BK, Shen-Orr SS, Klein AM, Melton DA, Yanai I. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 2016;3(4):346–60. https://doi.org/10.1016/j.cels.2016.08.011 .

doi: 10.1016/j.cels.2016.08.011 pubmed: 27667365 pmcid: 5228327

Segerstolpe Å, Palasantza A, Eliasson P, Andersson EM, Andréasson AC, Sun X, Picelli S, Sabirsh A, Clausen M, Bjursell MK, Smith DM, Kasper M, Ämmälä C, Sandberg R. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metabol. 2016;24(4):593–607. https://doi.org/10.1016/j.cmet.2016.08.020 .

doi: 10.1016/j.cmet.2016.08.020

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun. 2018;9(1):884. https://doi.org/10.1038/s41467-018-03282-0 .

doi: 10.1038/s41467-018-03282-0 pubmed: 29491377 pmcid: 5830442

Bagherinia A, Minaei-Bidgoli B, Hossinzadeh M, Parvin H. Elite fuzzy clustering ensemble based on clustering diversity and quality measures. Appl Intell. 2019;49(5):1724–47. https://doi.org/10.1007/s10489-018-1332-x .

doi: 10.1007/s10489-018-1332-x

Zhang AW, O’Flanagan C, Chavez EA, Lim JLP, Ceglia N, McPherson A, Wiens M, Walters P, Chan T, Hewitson B, Lai D, Mottok A, Sarkozy C, Chong L, Aoki T, Wang X, Weng AP, McAlpine JN, Aparicio S, Steidl C, Campbell KR, Shah SP. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods. 2019;16(10):1007–15. https://doi.org/10.1038/s41592-019-0529-1 .

doi: 10.1038/s41592-019-0529-1 pubmed: 31501550 pmcid: 7485597

Zhang Z, Luo D, Zhong X, Choi JH, Ma Y, Wang S, Mahrt E, Guo W, Stawiski EW, Modrusan Z, Seshagiri S, Kapur P, Hon GC, Brugarolas J, Wang T. Scina: semi-supervised analysis of single cells in silico. Genes. 2019;10(7):531. https://doi.org/10.3390/genes10070531 .

doi: 10.3390/genes10070531 pubmed: 31336988 pmcid: 6678337

Domanskyi S, Szedlak A, Hawkins NT, Wang J, Paternostro G, Piermarocchi C. Polled digital cell sorter (p-DCS): automatic identification of hematological cell types from single cell RNA-sequencing clusters. BMC Bioinform. 2019;20(1):369. https://doi.org/10.1186/s12859-019-2951-x .

doi: 10.1186/s12859-019-2951-x

Wagner F, Yanai I. Moana: a robust and scalable cell type classification framework for single-cell RNA-Seq data. bioRxiv. 2018, pp. 456129. https://doi.org/10.1101/456129 .

Pliner HA, Shendure J, Trapnell C. Supervised classification enables rapid annotation of cell atlases. Nat Methods. 2019;16(10):983–6. https://doi.org/10.1038/s41592-019-0535-3 .

doi: 10.1038/s41592-019-0535-3 pubmed: 31501545 pmcid: 6791524

Lin Y, Cao Y, Kim HJ, Salim A, Speed TP, Lin D, Yang P, Jean YHY. scClassify: hierarchical classification of cells. bioRxiv. 2019, pp. 776948. https://doi.org/10.1101/776948 .

van der Laan Mark, Pollard K. Hybrid clustering of gene expression data with visualization and the bootstrap. 2001;117:01.

van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-sne. J Mach Learn Res. 2008;9:2579–605.

van der Maaten LJP. Accelerating t-sne using tree-based algorithms. J Mach Learn Res. 2014;15:3221–45.

Krijthe JH. Rtsne: T-distributed stochastic neighbor embedding using barnes-hut implementation; 2015. https://github.com/jkrijthe/Rtsne . R package version 0.15.

Jaccard P. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Societe Vaudoise des Sciences Naturelles. 1901;37:241–72. https://doi.org/10.5169/seals-266440 .

doi: 10.5169/seals-266440

Taiyun K, Rui CI, Yingxin L, Andy Y-YW, Jean YHY, Pengyi Y. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief Bioinform. 2019;20(6):2316–26. https://doi.org/10.1093/bib/bby076 .

doi: 10.1093/bib/bby076

Ritchie ME, Phipson B, Wu DI, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47–e47. https://doi.org/10.1093/nar/gkv007 .

doi: 10.1093/nar/gkv007 pubmed: 25605792 pmcid: 4402510

Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B Methological. 1995;57(1):289–300. https://doi.org/10.2307/2346101 .

doi: 10.2307/2346101

Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD, Roskies R, Scott JR, Wilkins-Diehr N. Xsede: accelerating scientific discovery. Comput Sci Eng. 2014;16(5):62–74. https://doi.org/10.1109/MCSE.2014.80 .

doi: 10.1109/MCSE.2014.80

Herbert F. Dune. Philadelphia: Chilton Books; 1965.

Bell ET. The iterated exponential integers. Ann Math. 1938;39(3):539. https://doi.org/10.2307/1968633 .

doi: 10.2307/1968633

Blondel Vincent D, Loup GJ, Renaud L, Etienne L. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008;10:P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008 .

doi: 10.1088/1742-5468/2008/10/P10008

Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IWH, Ng LG, Ginhoux F, Newell EW. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37(1):38–44. https://doi.org/10.1038/nbt.4314 .

doi: 10.1038/nbt.4314

McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arxiv 2018. http://arxiv.org/abs/1802.03426 .

Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. https://doi.org/10.1038/s41598-019-41695-z .

doi: 10.1038/s41598-019-41695-z pubmed: 30914743 pmcid: 6435756

Improving replicability in single-cell RNA-Seq cell type discovery with Dune.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Hector Roux de Bézieux (H)

Kelly Street (K)

Stephan Fischer (S)

Koen Van den Berge (K)

Rebecca Chance (R)

Davide Risso (D)

Jesse Gillis (J)

John Ngai (J)

Elizabeth Purdom (E)

Sandrine Dudoit (S)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH