Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
06 2019
Historique:
received: 03 10 2018
accepted: 18 04 2019
pubmed: 28 5 2019
medline: 10 7 2019
entrez: 29 5 2019
Statut: ppublish

Résumé

Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.

Identifiants

pubmed: 31133762
doi: 10.1038/s41592-019-0425-8
pii: 10.1038/s41592-019-0425-8
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

479-487

Références

Cole, M. B. et al. Performance assessment and selection of normalization procedures for single-cell RNA-Seq. Cell Syst. 8, 315–328 (2019).
doi: 10.1016/j.cels.2019.03.010
Yip, S. H., Sham, P. C. & Wang, J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief. Bioinform. https://doi.org/10.1093/bib/bby011 (2018).
Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).
doi: 10.1038/nmeth.4612
Freytag, S., Tian, L., Lönnstedt, I., Ng, M. & Bahlo, M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res. 7, 1297 (2018).
doi: 10.12688/f1000research.15809.1
Duò, A., Robinson, M. D. & Soneson, C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res. 7, 1141 (2018).
doi: 10.12688/f1000research.15666.1
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
doi: 10.1038/s41587-019-0071-9
Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).
doi: 10.1038/nmeth.4220
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
doi: 10.1101/gr.121095.111
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1098 (2013).
doi: 10.1038/nmeth.2645
Grün, D., Kester, L. & Van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).
doi: 10.1038/nmeth.2930
Cope, L. M., Irizarry, R. A., Jaffee, H. A., Wu, Z. & Speed, T. P. A benchmark for affymetrix genechip expression measures. Bioinformatics 20, 323–331 (2004).
doi: 10.1093/bioinformatics/btg410
Sequencing Quality Control Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat. Biotechnol. 32, 903–914 (2014).
doi: 10.1038/nbt.2957
Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).
doi: 10.1038/srep39921
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2018).
Huber, W. et al. Orchestrating high-throughput genomic analysis with bioconductor. Nat. Methods 12, 115–121 (2015).
doi: 10.1038/nmeth.3252
Tian, L. et al. scPipe: a flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data. PLoS Comput. Biol. 14, e1006361 (2018).
doi: 10.1371/journal.pcbi.1006361
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
doi: 10.1186/s13059-016-0947-7
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
doi: 10.1186/gb-2010-11-3-r25
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).
doi: 10.1093/bioinformatics/btp616
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
doi: 10.1186/s13059-014-0550-8
Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput. Biol. 11, e1004333 (2015).
doi: 10.1371/journal.pcbi.1004333
Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).
doi: 10.1038/nmeth.4263
Yip, S. H., Wang, P., Kocher, J. P. A., Sham, P. C. & Wang, J. Linnorm: improved statistical analysis for single cell RNA-seq expression data. Nucleic Acids Res. 45, e179 (2017).
doi: 10.1093/nar/gkx828
Wagner, F., Yan, Y. & Yanai, I. K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data. Preprint at https://www.biorxiv.org/content/10.1101/217737v3 (2018).
Gong, W., Kwak, I.-Y., Pota, P., Koyano-Nakagawa, N. & Garry, D. J. DrImpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics. 19, 220 (2018).
doi: 10.1186/s12859-018-2226-y
Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).
doi: 10.1038/s41592-018-0033-z
Andrews, T. S. & Hemberg, M. False signals induced by single-cell imputation. F1000Res. 7, 1740 (2018).
doi: 10.12688/f1000research.16613.1
Herman, J. S., Sagar & Grün, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15, 379–386 (2018).
doi: 10.1038/nmeth.4662
Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
doi: 10.1038/ng.3818
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
doi: 10.1038/nbt.3192
Purdom, E. & Risso, D. clusterExperiment: Compare Clusterings for Single-Cell Sequencing. R package version 2.2.0 http://bioconductor.org/packages/3.8/bioc/html/clusterExperiment.html (2017).
Kiselev, V. Y. et al. SC3: Consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).
doi: 10.1038/nmeth.4236
Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).
doi: 10.1007/BF01908075
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).
doi: 10.1186/s12864-018-4772-0
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
doi: 10.1038/nmeth.4402
Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 047845 (2016).
doi: 10.1186/s13059-016-0975-3
Ji, Z. & Ji, H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117 (2016).
doi: 10.1093/nar/gkw430
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
doi: 10.1038/nmeth.3971
Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421 (2018).
doi: 10.1038/nbt.4091
Hie, B. L., Bryson, B. & Berger, B. Panoramic stitching of heterogeneous single-cell transcriptomic data. Preprint at https://www.biorxiv.org/content/10.1101/371179v1 (2018).
Lin, Y. et al. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc. Natl Acad. Sci. USA 116, 9775–9784 (2019).
pubmed: 31028141
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
doi: 10.1038/nbt.4096
Rohart, F., Eslami, A., Matigian, N., Bougeard, S. & Lê Cao, K.-A. MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinformatics 18, 128 (2017).
doi: 10.1186/s12859-017-1553-8
Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
doi: 10.1038/s41592-018-0254-1
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
doi: 10.1016/j.cell.2015.05.002
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
doi: 10.1038/ncomms14049
Holik, A. Z. et al. RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods. Nucleic Acids Res. 45, e30 (2017).
doi: 10.1093/nar/gkw1063
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
doi: 10.1186/s13059-016-0938-8
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
doi: 10.1016/j.cels.2016.09.002
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
doi: 10.1016/j.cell.2015.04.044
Liao, Y., Smyth, G. K. & Shi, W. The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
doi: 10.1093/nar/gkt214
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
McInnes, L., Healy, J., Saul, N. & Grossberger, L. Umap: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: cluster analysis basics and extensions. R package version 2.0.7-1 https://cran.r-project.org/web/packages/cluster/index.html (2018).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
doi: 10.1038/nbt.4042
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).
doi: 10.32614/RJ-2016-021
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).
Kolde, R. pheatmap: Pretty Heatmaps. R package v.1.0.10 https://CRAN.R-project.org/package=pheatmap (2018).

Auteurs

Luyi Tian (L)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. tian.l@wehi.edu.au.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia. tian.l@wehi.edu.au.

Xueyi Dong (X)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
College of Life Science, Zhejiang University, Hangzhou, China.

Saskia Freytag (S)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.

Kim-Anh Lê Cao (KA)

Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia.

Shian Su (S)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.

Abolfazl JalalAbadi (A)

Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia.

Daniela Amann-Zalcenstein (D)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.

Tom S Weber (TS)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.

Azadeh Seidi (A)

Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Victoria, Australia.

Jafar S Jabbari (JS)

Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Victoria, Australia.

Shalin H Naik (SH)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.

Matthew E Ritchie (ME)

The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. mritchie@wehi.edu.au.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia. mritchie@wehi.edu.au.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH