scRAE: Deterministic Regularized Autoencoders With Flexible Priors for Clustering Single-Cell Gene Expression Data.


Journal

IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755

Informations de publication

Date de publication:
Historique:
pubmed: 22 7 2021
medline: 13 10 2022
entrez: 21 7 2021
Statut: ppublish

Résumé

Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and computational challenges due to their high-dimensionality and data-sparsity, also known as 'dropout' events. Recently, Regularized Auto-Encoder (RAE) based deep neural network models have achieved remarkable success in learning robust low-dimensional representations. The basic idea in RAEs is to learn a non-linear mapping from the high-dimensional data space to a low-dimensional latent space and vice-versa, simultaneously imposing a distributional prior on the latent space, which brings in a regularization effect. This paper argues that RAEs suffer from the infamous problem of bias-variance trade-off in their naive formulation. While a simple AE wita latent regularization results in data over-fitting, a very strong prior leads to under-representation and thus bad clustering. To address the above issues, we propose a modified RAE framework (called the scRAE) for effective clustering of the single-cell RNA sequencing data. scRAE consists of deterministic AE with a flexibly learnable prior generator network, which is jointly trained with the AE. This facilitates scRAE to trade-off better between the bias and variance in the latent space. We demonstrate the efficacy of the proposed method through extensive experimentation on several real-world single-cell Gene expression datasets. The code for our work is available at https://github.com/arnabkmondal/scRAE.

Identifiants

pubmed: 34288873
doi: 10.1109/TCBB.2021.3098394
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2996-3007

Auteurs

Articles similaires

Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Animals Lung India Sheep Transcriptome
Humans Male Female Intensive Care Units COVID-19

Classifications MeSH