Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
11 Oct 2021
Historique:
received: 03 09 2020
revised: 07 04 2021
accepted: 30 04 2021
medline: 11 5 2021
pubmed: 11 5 2021
entrez: 10 5 2021
Statut: ppublish

Résumé

Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests. First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control. A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines. Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 33970215
pii: 6273181
doi: 10.1093/bioinformatics/btab337
pmc: PMC8504643
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

3243-3251

Subventions

Organisme : NHLBI NIH HHS
ID : K01 HL140261
Pays : United States
Organisme : NIDDK NIH HHS
ID : P30 DK054759
Pays : United States
Organisme : NIEHS NIH HHS
ID : P30 ES005605
Pays : United States
Organisme : NIH HHS
ID : NHLBI K01HL140261
Pays : United States
Organisme : NIH HHS
ID : NIDDK DK54759
Pays : United States
Organisme : NIH HHS
ID : NIEHS ES005605
Pays : United States

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press.

Auteurs

Andrew L Thurman (AL)

Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA.

Jason A Ratcliff (JA)

Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA.

Michael S Chimenti (MS)

Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA.

Alejandro A Pezzulo (AA)

Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA.

Classifications MeSH