Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods.
benchmark
clustering methods
high-dimensional data analysis
parameter sensitivity analysis
single-cell RNA-seq
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2019
2019
Historique:
received:
19
07
2019
accepted:
13
11
2019
entrez:
11
1
2020
pubmed:
11
1
2020
medline:
11
1
2020
Statut:
epublish
Résumé
Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
Identifiants
pubmed: 31921297
doi: 10.3389/fgene.2019.01253
pmc: PMC6918801
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1253Informations de copyright
Copyright © 2019 Krzak, Raykov, Boukouvalas, Cutillo and Angelini.
Références
Nat Struct Mol Biol. 2013 Sep;20(9):1131-9
pubmed: 23934149
Cell. 2015 May 21;161(5):1202-1214
pubmed: 26000488
Science. 2015 Mar 6;347(6226):1138-42
pubmed: 25700174
Mol Aspects Med. 2018 Feb;59:114-122
pubmed: 28712804
Bioinformatics. 2018 Jan 1;34(1):139-146
pubmed: 29036318
Genome Med. 2017 Aug 18;9(1):75
pubmed: 28821273
F1000Res. 2018 Aug 15;7:1297
pubmed: 30228881
Cell. 2015 May 21;161(5):1187-1201
pubmed: 26000487
Nat Methods. 2009 May;6(5):377-82
pubmed: 19349980
Genome Biol. 2017 Sep 12;18(1):174
pubmed: 28899397
Nat Methods. 2018 May;15(5):379-386
pubmed: 29630061
Cell Syst. 2016 Oct 26;3(4):346-360.e4
pubmed: 27667365
Cell Metab. 2016 Oct 11;24(4):608-615
pubmed: 27667665
Nat Methods. 2017 May;14(5):483-486
pubmed: 28346451
Cell. 2016 Mar 24;165(1):61-74
pubmed: 27015307
Gigascience. 2019 Aug 1;8(8):
pubmed: 31505654
R J. 2016 Aug;8(1):289-317
pubmed: 27818791
Nat Methods. 2017 Mar;14(3):309-315
pubmed: 28114287
Genome Res. 2014 Nov;24(11):1787-96
pubmed: 25096407
Genomics Proteomics Bioinformatics. 2019 Apr;17(2):201-210
pubmed: 31202000
Nucleic Acids Res. 2016 Jul 27;44(13):e117
pubmed: 27179027
Front Genet. 2019 Apr 05;10:317
pubmed: 31024627
Bioinformatics. 2015 Oct 15;31(20):3380-2
pubmed: 26099264
Nat Neurosci. 2016 Feb;19(2):335-46
pubmed: 26727548
BMC Bioinformatics. 2016 Mar 22;17:140
pubmed: 27005807
Nucleic Acids Res. 2017 Dec 15;45(22):13097
pubmed: 29161425
Nat Methods. 2017 Apr;14(4):381-387
pubmed: 28263961
Nat Genet. 2017 May;49(5):708-718
pubmed: 28319088
PLoS Genet. 2018 Nov 12;14(11):e1007788
pubmed: 30418965
Nat Methods. 2017 Apr;14(4):414-416
pubmed: 28263960
Bioinformatics. 2017 Apr 15;33(8):1179-1186
pubmed: 28088763
Nature. 2014 May 15;509(7500):371-5
pubmed: 24739965
Proc Natl Acad Sci U S A. 2015 Jun 9;112(23):7285-90
pubmed: 26060301
Elife. 2019 Mar 26;8:
pubmed: 30912746
Nat Methods. 2019 Jun;16(6):479-487
pubmed: 31133762
F1000Res. 2018 Jul 26;7:1141
pubmed: 30271584
Nat Protoc. 2018 Apr;13(4):599-604
pubmed: 29494575
Genome Biol. 2017 Mar 28;18(1):59
pubmed: 28351406
Nat Neurosci. 2017 Feb;20(2):176-188
pubmed: 27991900
Cell Metab. 2016 Oct 11;24(4):593-607
pubmed: 27667667
Nat Rev Genet. 2019 May;20(5):273-282
pubmed: 30617341
Science. 2014 Jan 10;343(6167):193-6
pubmed: 24408435
Mol Syst Biol. 2019 Jun 19;15(6):e8746
pubmed: 31217225
Cell Stem Cell. 2015 Oct 1;17(4):471-85
pubmed: 26431182