Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets.
cell clustering
cell type classification
gene signature discovery
machine learning
single-cell RNA sequencing
tumor microenvironment
Journal
Frontiers in immunology
ISSN: 1664-3224
Titre abrégé: Front Immunol
Pays: Switzerland
ID NLM: 101560960
Informations de publication
Date de publication:
2023
2023
Historique:
received:
27
03
2023
accepted:
14
07
2023
medline:
24
8
2023
pubmed:
23
8
2023
entrez:
23
8
2023
Statut:
epublish
Résumé
Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.
Sections du résumé
Background
Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches.
Results
We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment.
Discussion and conclusion
We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.
Identifiants
pubmed: 37609075
doi: 10.3389/fimmu.2023.1194745
pmc: PMC10441575
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1194745Informations de copyright
Copyright © 2023 Aybey, Zhao, Brors and Staub.
Déclaration de conflit d'intérêts
ES and SZ are employees of Merck KGaA Darmstadt. BA receives funding from Merck KGaA Darmstadt for his doctoral studies. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Nat Immunol. 2019 Feb;20(2):163-172
pubmed: 30643263
Cell Rep. 2019 Dec 3;29(10):3019-3032.e6
pubmed: 31801070
PLoS One. 2017 Jan 3;12(1):e0169271
pubmed: 28045996
Nat Commun. 2017 Dec 11;8(1):2032
pubmed: 29230012
Cell. 2015 Jan 15;160(1-2):48-61
pubmed: 25594174
Biostatistics. 2022 Oct 14;23(4):1150-1164
pubmed: 35770795
Cell. 2021 Jun 24;184(13):3573-3587.e29
pubmed: 34062119
Nat Commun. 2020 May 8;11(1):2285
pubmed: 32385277
Nucleic Acids Res. 2019 Jan 8;47(D1):D721-D728
pubmed: 30289549
Cell Syst. 2015 Dec 23;1(6):417-425
pubmed: 26771021
J Leukoc Biol. 2004 Feb;75(2):163-89
pubmed: 14525967
Immunity. 2013 Oct 17;39(4):782-95
pubmed: 24138885
EMBO Mol Med. 2021 Oct 7;13(10):e14123
pubmed: 34409732
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601
Nat Rev Genet. 2019 May;20(5):273-282
pubmed: 30617341
Immunity. 2019 May 21;50(5):1317-1334.e10
pubmed: 30979687
Nucleic Acids Res. 2023 Jan 6;51(D1):D1425-D1431
pubmed: 36321662
Nat Commun. 2022 Mar 10;13(1):1246
pubmed: 35273156
Cell. 2019 Oct 31;179(4):829-845.e20
pubmed: 31675496
Nat Methods. 2015 May;12(5):453-7
pubmed: 25822800
Nat Med. 2020 Apr;26(4):618-629
pubmed: 32094927
Transl Oncol. 2012 Aug;5(4):297-304
pubmed: 22937182
Genome Biol. 2015 Mar 31;16:64
pubmed: 25853550
Cell. 2020 Apr 30;181(3):747
pubmed: 32359441
Genome Biol. 2016 Oct 20;17(1):218
pubmed: 27765066
Cell Rep. 2017 Jan 3;18(1):248-262
pubmed: 28052254
Genomics Proteomics Bioinformatics. 2021 Apr;19(2):267-281
pubmed: 33359678
PLoS Genet. 2022 May 10;18(5):e1010210
pubmed: 35536863
Nat Genet. 2021 Sep;53(9):1334-1347
pubmed: 34493872
Cancer Immunol Res. 2018 Nov;6(11):1388-1400
pubmed: 30266715
Comput Struct Biotechnol J. 2021 Jan 19;19:961-969
pubmed: 33613863
Nucleic Acids Res. 2019 Sep 19;47(16):e95
pubmed: 31226206
Genome Res. 2021 Oct;31(10):1913-1926
pubmed: 34548323
Cell. 2019 Jun 13;177(7):1888-1902.e21
pubmed: 31178118
Brief Bioinform. 2020 Sep 25;21(5):1581-1595
pubmed: 31675098
Science. 2022 May 13;376(6594):eabl5197
pubmed: 35549406
Nature. 2020 Mar;579(7798):274-278
pubmed: 32103181
Genome Biol. 2020 Feb 7;21(1):31
pubmed: 32033589
Genes Immun. 2005 Jun;6(4):319-31
pubmed: 15789058
Oncotarget. 2017 May 16;8(32):53763-53779
pubmed: 28881849
Cell Genom. 2022 Sep 14;2(9):
pubmed: 36204155