Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects.


Journal

Biometrics
ISSN: 1541-0420
Titre abrégé: Biometrics
Pays: United States
ID NLM: 0370625

Informations de publication

Date de publication:
12 2019
Historique:
received: 02 02 2018
accepted: 02 04 2019
pubmed: 23 4 2019
medline: 10 9 2020
entrez: 23 4 2019
Statut: ppublish

Résumé

Single-cell RNA sequencing (scRNA-seq) technologies are revolutionary tools allowing researchers to examine gene expression at the level of a single cell. Traditionally, transcriptomic data have been analyzed from bulk samples, masking the heterogeneity now seen across individual cells. Even within the same cellular population, genes can be highly expressed in some cells but not expressed (or lowly expressed) in others. Therefore, the computational approaches used to analyze bulk RNA sequencing data are not appropriate for the analysis of scRNA-seq data. Here, we present a novel statistical model for high dimensional and zero-inflated scRNA-seq count data to identify differentially expressed (DE) genes across cell types. Correlated random effects are employed based on an initial clustering of cells to capture the cell-to-cell variability within treatment groups. Moreover, this model is flexible and can be easily adapted to an independent random effect structure if needed. We apply our proposed methodology to both simulated and real data and compare results to other popular methods designed for detecting DE genes. Due to the hurdle model's ability to detect differences in the proportion of cells expressed and the average expression level (among the expressed cells), our methods naturally identify some genes as DE that other methods do not, and we demonstrate with real data that these uniquely detected genes are associated with similar biological processes and functions.

Identifiants

pubmed: 31009065
doi: 10.1111/biom.13074
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1051-1062

Informations de copyright

© 2019 The International Biometric Society.

Références

Bacher, R. and Kendziorski, C. (2016). Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biology, 17, 63.
Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112, 859-877.
Buettner, F., Natarajan, K.N., Casale, F.P., Proserpio, V., Scialdone, A., Theis, F.J. et al. (2015). Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature Biotechnology, 33, 155-160.
Chen, Y.C., Wang, Y.S. and Erosheva, E.A. (2018). On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example. The Annals of Applied Statistics, 12, 846-876.
edgeR package (2019). Bioconductor, https://bioconductor.org/packages/release/bioc/html/edgeR.html (Accessed 24 February 2019).
Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A.K. et al. (2015). MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology, 16, 1.
Gelman, A., Jakulin, A., Pittau, M.G. and Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360-1383.
Huang, S. (2009). Non-genetic heterogeneity of cells in development: more than just noise. Development, 136, 3853-3862.
Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J.B., Lönnerberg, P. et al. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, 21, 1160-1167.
Kharchenko, P.V., Silberstein, L. and Scadden, D.T. (2014). Bayesian approach to single-cell differential expression analysis. Nature Methods, 11, 740-742.
Kiselev, V.Y., Kirschner, K., Schaub, M.T., Andrews, T., Yiu, A., Chandra, T. et al. (2017). SC3: Consensus clustering of single-cell RNA-seq data. Nature Methods, 14, 483.
Kucukelbir, A., Ranganath, R., Gelman, A., and Blei, D. (2015). Automatic variational inference in Stan. Advances in Neural Information Processing Systems, 28, 568-576.
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D.M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18, 1-45.
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R., and Pfister, H. (2014). UpSet: visualization of intersecting sets. IEEE Transactions on Visualization and Computer Graphics, 20, 1983-1992.
Love, M.I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.
Macaulay, I.C. and Voet, T. (2014). Single cell genomics: advances and future perspectives. PLoS Genetics, 10, e1004126.
McDavid, A., Finak, G., and Yajima, M. (2019). MAST: Model-based Analysis of Single Cell Transcriptomics. R package version 1.8.2, https://github.com/RGLab/MAST/ (accessed 24 February 2019).
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Robinson, M.D., and Smyth, G.K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23, 2881-2887.
Shalek, A.K., Satija, R., Adiconis, X., Gertner, R.S., Gaublomme, J.T., Raychowdhury, R. et al. (2013). Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature, 498, 236-240.
Shapiro, E., Biezuner, T., and Linnarsson, S. (2013). Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Reviews Genetics, 14, 618-630.
Stan Development Team (2018). RStan: The R interface to Stan., R package version 2.18.2. http://mc-stan.org (Accessed 24 February 2019).
Stegle, O., Teichmann, S.A., and Marioni, J.C. (2015). Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics, 16, 133-145.
Wang, Y. and Blei, D.M. (2018). Frequentist consistency of variational Bayes. Journal of the American Statistical Association, 17, 1-15.
Xu, C. and Su, Z. (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31, 1974-1980.
Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18, 174.

Auteurs

Michael Sekula (M)

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky.

Jeremy Gaskins (J)

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky.

Susmita Datta (S)

Department of Biostatistics, University of Florida, Gainesville, Florida.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH