Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs.
Journal
bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
05 Aug 2023
05 Aug 2023
Historique:
pubmed:
14
8
2023
medline:
14
8
2023
entrez:
14
8
2023
Statut:
epublish
Résumé
Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce
Identifiants
pubmed: 37577688
doi: 10.1101/2023.08.02.551637
pmc: PMC10418209
pii:
doi:
Types de publication
Preprint
Langues
eng