scClassify: sample size estimation and multiscale classification of cells using single and multiple reference.


Journal

Molecular systems biology
ISSN: 1744-4292
Titre abrégé: Mol Syst Biol
Pays: England
ID NLM: 101235389

Informations de publication

Date de publication:
06 2020
Historique:
received: 03 12 2019
revised: 22 05 2020
accepted: 26 05 2020
entrez: 23 6 2020
pubmed: 23 6 2020
medline: 9 7 2021
Statut: ppublish

Résumé

Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.

Identifiants

pubmed: 32567229
doi: 10.15252/msb.20199389
pmc: PMC7306901
doi:

Banques de données

GEO
['GSE84133', 'GSE86469', 'GSE85241', 'GSE71585', 'GSE83139', 'GSE81608']

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e9389

Subventions

Organisme : NIDCD NIH HHS
ID : R21 DC015107
Pays : United States

Informations de copyright

© 2020 The Authors. Published under the terms of the CC BY 4.0 license.

Références

PLoS One. 2018 Oct 10;13(10):e0205499
pubmed: 30304022
Nat Commun. 2019 Jun 13;10(1):2611
pubmed: 31197158
Mol Syst Biol. 2020 Jun;16(6):e9389
pubmed: 32567229
Diabetes. 2016 Oct;65(10):3028-38
pubmed: 27364731
F1000Res. 2019 Mar 15;8:
pubmed: 31508207
Nat Methods. 2018 Dec;15(12):1053-1058
pubmed: 30504886
Nucleic Acids Res. 2016 Jul 27;44(13):e119
pubmed: 27190235
Nature. 2018 Oct;562(7727):367-372
pubmed: 30283141
Nature. 2018 Nov;563(7729):72-78
pubmed: 30382198
Cell. 2015 Nov 5;163(4):799-810
pubmed: 26544934
Nat Immunol. 2019 Feb;20(2):163-172
pubmed: 30643263
Genome Res. 2017 Feb;27(2):208-222
pubmed: 27864352
J Clin Invest. 2005 Dec;115(12):3332-8
pubmed: 16322777
Cell Syst. 2016 Oct 26;3(4):346-360.e4
pubmed: 27667365
Cell Metab. 2016 Oct 11;24(4):608-615
pubmed: 27667665
Nat Methods. 2017 May;14(5):483-486
pubmed: 28346451
Cell Syst. 2019 Aug 28;9(2):207-213.e2
pubmed: 31377170
J Comput Biol. 2003;10(2):119-42
pubmed: 12804087
Mol Cell. 2015 May 21;58(4):610-20
pubmed: 26000846
Genome Res. 2015 Oct;25(10):1491-8
pubmed: 26430159
Cell. 2018 Nov 1;175(4):1031-1044.e18
pubmed: 30318149
Bioinformatics. 2020 Jan 15;36(2):533-538
pubmed: 31359028
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
Nat Methods. 2018 May;15(5):359-362
pubmed: 29608555
BMC Bioinformatics. 2017 Dec 21;18(Suppl 17):559
pubmed: 29322913
Gigascience. 2019 Sep 1;8(9):
pubmed: 31531674
Genome Biol. 2019 Sep 9;20(1):194
pubmed: 31500660
Nat Neurosci. 2018 Jan;21(1):120-129
pubmed: 29230054
Genome Biol. 2019 Dec 12;20(1):264
pubmed: 31829268
Nat Neurosci. 2016 Feb;19(2):335-46
pubmed: 26727548
Bioinformatics. 2019 Dec 15;35(24):5155-5162
pubmed: 31197307
Nat Methods. 2017 Apr;14(4):414-416
pubmed: 28263960
Cancer Inform. 2009 Aug 05;7:199-216
pubmed: 19718451
Bioinformatics. 2017 Apr 15;33(8):1179-1186
pubmed: 28088763
Cell Syst. 2019 Apr 24;8(4):329-337.e4
pubmed: 30954475
Nat Methods. 2019 May;16(5):381-386
pubmed: 30962620
iScience. 2020 Mar 27;23(3):100914
pubmed: 32151972
Brief Bioinform. 2019 Nov 27;20(6):2316-2326
pubmed: 30137247
PLoS One. 2011;6(10):e26088
pubmed: 22022519
Nat Biotechnol. 2020 Jun;38(6):737-746
pubmed: 32341560
F1000Res. 2018 Aug 15;7:1297
pubmed: 30228881
Cell Syst. 2016 Oct 26;3(4):385-394.e3
pubmed: 27693023
Nucleic Acids Res. 2019 Sep 19;47(16):e95
pubmed: 31226206
Brief Bioinform. 2020 Sep 25;21(5):1581-1595
pubmed: 31675098
Nat Methods. 2019 Oct;16(10):983-986
pubmed: 31501545
Immunology. 2018 May;154(1):3-20
pubmed: 29313948
Cell Metab. 2016 Oct 11;24(4):593-607
pubmed: 27667667
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601

Auteurs

Yingxin Lin (Y)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.

Yue Cao (Y)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.

Hani Jieun Kim (HJ)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.
Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, Australia.

Agus Salim (A)

Department of Mathematics and Statistics, La Trobe University, Bundoora, VIC, Australia.
Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.

Terence P Speed (TP)

Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.

David M Lin (DM)

Department of Biomedical Sciences, Cornell University, Ithaca, NY, USA.

Pengyi Yang (P)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.
Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, Australia.

Jean Yee Hwa Yang (JYH)

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH