HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD).
Cell composition deconvolution
Deep learning
Harmonization
RNA-seq
Journal
BMC medical genomics
ISSN: 1755-8794
Titre abrégé: BMC Med Genomics
Pays: England
ID NLM: 101319628
Informations de publication
Date de publication:
31 10 2023
31 10 2023
Historique:
received:
11
10
2022
accepted:
27
09
2023
medline:
2
11
2023
pubmed:
1
11
2023
entrez:
1
11
2023
Statut:
epublish
Résumé
Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types. Reference gene expression signatures could be generated from cell-specific gene expression profiles, such as scRNA-seq. However, the batch effects and dropout events frequently observed across scRNA-seq datasets have limited the performances of CCD methods. We developed a deep neural network (DNN) model, HASCAD, to predict the cell fractions of up to 15 immune cell types. HASCAD was trained using the bulk RNA-seq simulated from three scRNA-seq datasets that have been normalized by using a Harmony-Symphony based strategy. Mean square error and Pearson correlation coefficient were used to compare the performance of HASCAD with those of other widely used CCD methods. Two types of datasets, including a set of simulated bulk RNA-seq, and three human PBMC RNA-seq datasets, were arranged to conduct the benchmarks. HASCAD is useful for the investigation of the impacts of immune cell heterogeneity on the therapeutic effects of immune checkpoint inhibitors, since the target cell types include the ones known to play a role in anti-tumor immunity, such as three subtypes of CD8 T cells and three subtypes of CD4 T cells. We found that the removal of batch effects in the reference scRNA-seq datasets could benefit the task of CCD. Our benchmarks showed that HASCAD is more suitable for analyzing bulk RNA-seq data, compared with the two widely used CCD methods, CIBERSORTx and quanTIseq. We applied HASCAD to analyze the liver cancer samples of TCGA-LIHC, and found that there were significant associations of the predicted abundance of Treg and effector CD8 T cell with patients' overall survival. HASCAD could predict the cell composition of the PBMC bulk RNA-seq and classify the cell type from pure bulk RNA-seq. The model of HASCAD is available at https://github.com/holiday01/HASCAD .
Sections du résumé
BACKGROUND
Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types. Reference gene expression signatures could be generated from cell-specific gene expression profiles, such as scRNA-seq. However, the batch effects and dropout events frequently observed across scRNA-seq datasets have limited the performances of CCD methods.
METHODS
We developed a deep neural network (DNN) model, HASCAD, to predict the cell fractions of up to 15 immune cell types. HASCAD was trained using the bulk RNA-seq simulated from three scRNA-seq datasets that have been normalized by using a Harmony-Symphony based strategy. Mean square error and Pearson correlation coefficient were used to compare the performance of HASCAD with those of other widely used CCD methods. Two types of datasets, including a set of simulated bulk RNA-seq, and three human PBMC RNA-seq datasets, were arranged to conduct the benchmarks.
RESULTS
HASCAD is useful for the investigation of the impacts of immune cell heterogeneity on the therapeutic effects of immune checkpoint inhibitors, since the target cell types include the ones known to play a role in anti-tumor immunity, such as three subtypes of CD8 T cells and three subtypes of CD4 T cells. We found that the removal of batch effects in the reference scRNA-seq datasets could benefit the task of CCD. Our benchmarks showed that HASCAD is more suitable for analyzing bulk RNA-seq data, compared with the two widely used CCD methods, CIBERSORTx and quanTIseq. We applied HASCAD to analyze the liver cancer samples of TCGA-LIHC, and found that there were significant associations of the predicted abundance of Treg and effector CD8 T cell with patients' overall survival.
CONCLUSION
HASCAD could predict the cell composition of the PBMC bulk RNA-seq and classify the cell type from pure bulk RNA-seq. The model of HASCAD is available at https://github.com/holiday01/HASCAD .
Identifiants
pubmed: 37907883
doi: 10.1186/s12920-023-01674-w
pii: 10.1186/s12920-023-01674-w
pmc: PMC10619225
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
272Informations de copyright
© 2023. The Author(s).
Références
Int J Mol Sci. 2016 Mar 01;17(3):320
pubmed: 26938527
Cancer Med. 2019 Jan;8(1):80-93
pubmed: 30600646
Nat Immunol. 2021 Jul;22(7):809-819
pubmed: 34140679
Genome Med. 2019 May 24;11(1):34
pubmed: 31126321
Cancer Sci. 2019 Jul;110(7):2080-2089
pubmed: 31102428
Nat Methods. 2015 May;12(5):453-7
pubmed: 25822800
Nat Methods. 2019 Dec;16(12):1289-1296
pubmed: 31740819
Cell Rep. 2020 Jan 28;30(4):984-996.e4
pubmed: 31995767
Genome Biol. 2020 Jan 16;21(1):12
pubmed: 31948481
Nat Commun. 2019 Jul 5;10(1):2975
pubmed: 31278265
Sci Adv. 2020 Jul 22;6(30):eaba2619
pubmed: 32832661
Sci Rep. 2016 Oct 11;6:35056
pubmed: 27725696
Nat Biotechnol. 2019 Jul;37(7):773-782
pubmed: 31061481
Nat Commun. 2021 Oct 7;12(1):5890
pubmed: 34620862
Nat Commun. 2021 Jun 23;12(1):3895
pubmed: 34162860
Sci Rep. 2021 Mar 5;11(1):5325
pubmed: 33674641
J Ovarian Res. 2014 Feb 08;7:19
pubmed: 24507759
Nat Commun. 2019 Jan 22;10(1):380
pubmed: 30670690
Lancet. 2001 Feb 17;357(9255):539-45
pubmed: 11229684
Science. 2017 Mar 17;355(6330):1103
pubmed: 28302798
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Wiley Interdiscip Rev RNA. 2017 Jan;8(1):
pubmed: 27198714
Nat Commun. 2018 Nov 9;9(1):4735
pubmed: 30413720
Brief Bioinform. 2021 Jan 18;22(1):416-427
pubmed: 31925417
Immunity. 2014 Jul 17;41(1):49-61
pubmed: 25035953