Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects.

Animals Data Interpretation, Statistical Gene Expression Profiling / methods Humans Models, Statistical Sequence Analysis, RNA / methods Single-Cell Analysis / methods

RNA sequencing differential expression single-cell variational inference

Journal

Biometrics

ISSN: 1541-0420

Titre abrégé: Biometrics

Pays: United States

ID NLM: 0370625

Informations de publication

Date de publication:
12 2019

Historique:

received: 02 02 2018

accepted: 02 04 2019

pubmed: 23 4 2019

medline: 10 9 2020

entrez: 23 4 2019

Statut: ppublish

Résumé

Single-cell RNA sequencing (scRNA-seq) technologies are revolutionary tools allowing researchers to examine gene expression at the level of a single cell. Traditionally, transcriptomic data have been analyzed from bulk samples, masking the heterogeneity now seen across individual cells. Even within the same cellular population, genes can be highly expressed in some cells but not expressed (or lowly expressed) in others. Therefore, the computational approaches used to analyze bulk RNA sequencing data are not appropriate for the analysis of scRNA-seq data. Here, we present a novel statistical model for high dimensional and zero-inflated scRNA-seq count data to identify differentially expressed (DE) genes across cell types. Correlated random effects are employed based on an initial clustering of cells to capture the cell-to-cell variability within treatment groups. Moreover, this model is flexible and can be easily adapted to an independent random effect structure if needed. We apply our proposed methodology to both simulated and real data and compare results to other popular methods designed for detecting DE genes. Due to the hurdle model's ability to detect differences in the proportion of cells expressed and the average expression level (among the expressed cells), our methods naturally identify some genes as DE that other methods do not, and we demonstrate with real data that these uniquely detected genes are associated with similar biological processes and functions.

Identifiants

DOI: 10.1111/biom.13074 PMID: 31009065

pubmed: 31009065

doi: 10.1111/biom.13074

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1051-1062

Informations de copyright

Références

Bacher, R. and Kendziorski, C. (2016). Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biology, 17, 63.

Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112, 859-877.

Buettner, F., Natarajan, K.N., Casale, F.P., Proserpio, V., Scialdone, A., Theis, F.J. et al. (2015). Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature Biotechnology, 33, 155-160.

Chen, Y.C., Wang, Y.S. and Erosheva, E.A. (2018). On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example. The Annals of Applied Statistics, 12, 846-876.

edgeR package (2019). Bioconductor, https://bioconductor.org/packages/release/bioc/html/edgeR.html (Accessed 24 February 2019).

Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A.K. et al. (2015). MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology, 16, 1.

Gelman, A., Jakulin, A., Pittau, M.G. and Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360-1383.

Huang, S. (2009). Non-genetic heterogeneity of cells in development: more than just noise. Development, 136, 3853-3862.

Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J.B., Lönnerberg, P. et al. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, 21, 1160-1167.

Kharchenko, P.V., Silberstein, L. and Scadden, D.T. (2014). Bayesian approach to single-cell differential expression analysis. Nature Methods, 11, 740-742.

Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T. et al. (2017). SC3: Consensus clustering of single-cell RNA-seq data. Nature Methods, 14, 483.

Kucukelbir, A., Ranganath, R., Gelman, A., and Blei, D. (2015). Automatic variational inference in Stan. Advances in Neural Information Processing Systems, 28, 568-576.

Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D.M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18, 1-45.

Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R., and Pfister, H. (2014). UpSet: visualization of intersecting sets. IEEE Transactions on Visualization and Computer Graphics, 20, 1983-1992.

Love, M.I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.

Macaulay, I.C. and Voet, T. (2014). Single cell genomics: advances and future perspectives. PLoS Genetics, 10, e1004126.

McDavid, A., Finak, G., and Yajima, M. (2019). MAST: Model-based Analysis of Single Cell Transcriptomics. R package version 1.8.2, https://github.com/RGLab/MAST/ (accessed 24 February 2019).

R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Robinson, M.D., and Smyth, G.K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23, 2881-2887.

Shalek, A.K., Satija, R., Adiconis, X., Gertner, R.S., Gaublomme, J.T., Raychowdhury, R. et al. (2013). Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature, 498, 236-240.

Shapiro, E., Biezuner, T., and Linnarsson, S. (2013). Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Reviews Genetics, 14, 618-630.

Stan Development Team (2018). RStan: The R interface to Stan., R package version 2.18.2. http://mc-stan.org (Accessed 24 February 2019).

Stegle, O., Teichmann, S.A., and Marioni, J.C. (2015). Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics, 16, 133-145.

Wang, Y. and Blei, D.M. (2018). Frequentist consistency of variational Bayes. Journal of the American Statistical Association, 17, 1-15.

Xu, C. and Su, Z. (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31, 1974-1980.

Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18, 174.

Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Michael Sekula (M)

Jeremy Gaskins (J)

Susmita Datta (S)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH