Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
11 05 2020
Historique:
received: 17 11 2019
accepted: 20 03 2020
entrez: 13 5 2020
pubmed: 13 5 2020
medline: 6 8 2020
Statut: epublish

Résumé

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. We present DESC, an unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function. Through iterative self-learning, DESC gradually removes batch effects, as long as technical differences across batches are smaller than true biological variations. As a soft clustering algorithm, cluster assignment probabilities from DESC are biologically interpretable and can reveal both discrete and pseudotemporal structure of cells. Comprehensive evaluations show that DESC offers a proper balance of clustering accuracy and stability, has a small footprint on memory, does not explicitly require batch information for batch effect removal, and can utilize GPU when available. As the scale of single-cell studies continues to grow, we believe DESC will offer a valuable tool for biomedical researchers to disentangle complex cellular heterogeneity.

Identifiants

pubmed: 32393754
doi: 10.1038/s41467-020-15851-3
pii: 10.1038/s41467-020-15851-3
pmc: PMC7214470
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

2338

Subventions

Organisme : NEI NIH HHS
ID : R01 EY030192
Pays : United States
Organisme : NEI NIH HHS
ID : R01 EY031209
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM108600
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM125301
Pays : United States

Références

Regev, A. et al. The Human Cell Atlas. Elife 6, e27041 (2017).
doi: 10.7554/eLife.27041
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).
doi: 10.1093/biostatistics/kxx053
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
doi: 10.1038/nbt.4096
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
doi: 10.1038/nbt.4091
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177(1888-1902), e1821 (2019).
Blondel, V. D., Guillaume, J. L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 10008–10012 (2008).
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).
doi: 10.1073/pnas.0706851105
Xu, C. & Su, Z. C. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31, 1974–1980 (2015).
doi: 10.1093/bioinformatics/btv088
Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).
doi: 10.1038/nmeth.4236
Peng, Y. R. et al. Molecular classification and comparative taxonomics of foveal and peripheral cells in primate retina. Cell 176, 1222–1237 e1222 (2019).
doi: 10.1016/j.cell.2019.01.004
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nat. Biotechnol. 37, 685–691 (2019).
doi: 10.1038/s41587-019-0113-3
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
doi: 10.1038/s41592-018-0229-2
Wang, T. et al. BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biol. 20, 165 (2019).
doi: 10.1186/s13059-019-1764-6
Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res 27, 208–222 (2017).
doi: 10.1101/gr.212720.116
Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
doi: 10.1016/j.cmet.2016.08.020
Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
doi: 10.1016/j.stem.2016.05.010
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3(385-394), e383 (2016).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
doi: 10.1038/nbt.4042
Henig, N. et al. Interferon-beta induces distinct gene expression response patterns in human monocytes versus T cells. PLoS ONE 8, e62366 (2013).
doi: 10.1371/journal.pone.0062366
van Boxel-Dezaire, A. H. et al. Major differences in the responses of primary human leukocyte subsets to IFN-beta. J. Immunol. 185, 5888–5899 (2010).
doi: 10.4049/jimmunol.0902314
Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).
doi: 10.1016/j.cell.2015.11.013
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
doi: 10.1038/s41586-019-0969-x
Xie, J., Girshick, R. & Farhadi, A. Unsupervised deep embedding for clustering analysis. In Proc. International Conference on Machine Learning. 478–487 (2016).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
doi: 10.1186/s13059-017-1382-0

Auteurs

Xiangjie Li (X)

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872, China.
State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100037, China.

Kui Wang (K)

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Department of Information Theory and Data Science, School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.

Yafei Lyu (Y)

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Huize Pan (H)

Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York, NY, 10032, USA.

Jingxiao Zhang (J)

Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872, China.

Dwight Stambolian (D)

Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Katalin Susztak (K)

Departments of Medicine and Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Muredach P Reilly (MP)

Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York, NY, 10032, USA.

Gang Hu (G)

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA. huggs@nankai.edu.cn.
School of Statistics and Data Science, Key Laboratory for medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin, 300071, China. huggs@nankai.edu.cn.

Mingyao Li (M)

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA. mingyao@pennmedicine.upenn.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH