A convolutional neural network model for survival prediction based on prognosis-related cascaded Wx feature selection.
Journal
Laboratory investigation; a journal of technical methods and pathology
ISSN: 1530-0307
Titre abrégé: Lab Invest
Pays: United States
ID NLM: 0376617
Informations de publication
Date de publication:
10 2022
10 2022
Historique:
received:
17
02
2022
accepted:
26
04
2022
revised:
22
04
2022
pubmed:
10
7
2022
medline:
28
9
2022
entrez:
9
7
2022
Statut:
ppublish
Résumé
Great advances in deep learning have provided effective solutions for prediction tasks in the biomedical field. However, accurate prognosis prediction using cancer genomics data remains challenging due to the severe overfitting problem caused by curse of dimensionality inherent to high-throughput sequencing data. Moreover, there are unique challenges to perform survival analysis, arising from the difficulty in utilizing censored samples whose events of interest are not observed. Convolutional neural network (CNN) models provide us the opportunity to extract meaningful hierarchical features to characterize cancer subtype and prognosis outcomes. On the other hand, feature selection can mitigate overfitting and reduce subsequent model training computation burden by screening out significant genes from redundant genes. To accomplish model simplification, we developed a concise and efficient survival analysis model, named CNN-Cox model, which combines a special CNN framework with prognosis-related feature selection cascaded Wx, with the advantage of less computation demand utilizing light training parameters. Experiment results show that CNN-Cox model achieved consistent higher C-index values and better survival prediction performance across seven cancer type datasets in The Cancer Genome Atlas cohort, including bladder carcinoma, head and neck squamous cell carcinoma, kidney renal cell carcinoma, brain low-grade glioma, lung adenocarcinoma (LUAD), lung squamous cell carcinoma, and skin cutaneous melanoma, compared with the existing state-of-the-art survival analysis methods. As an illustration of model interpretation, we examined potential prognostic gene signatures of LUAD dataset using the proposed CNN-Cox model. We conducted protein-protein interaction network analysis to identify potential prognostic genes and further analyzed the biological function of 13 hub genes, including ANLN, RACGAP1, KIF4A, KIF20A, KIF14, ASPM, CDK1, SPC25, NCAPG, MKI67, HJURP, EXO1, HMMR, whose high expression is significantly associated with poor survival of LUAD patients. These findings confirmed that CNN-Cox model is effective in extracting not only prognosis factors but also biologically meaningful gene features. The codes are available at the GitHub website: https://github.com/wangwangCCChen/CNN-Cox .
Identifiants
pubmed: 35810236
doi: 10.1038/s41374-022-00801-y
pii: S0023-6837(22)00114-3
doi:
pii:
Substances chimiques
Nerve Tissue Proteins
0
KIF4A protein, human
EC 3.6.1.-
Kinesins
EC 3.6.4.4
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1064-1074Informations de copyright
© 2022. The Author(s), under exclusive licence to United States and Canadian Academy of Pathology.
Références
Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a shared vision for cancer genomic data. N Engl J Med 375, 1109–1112 (2016)
Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, et al. The Cancer Genome Atlas pan-cancer analysis project. Nat Genet 45, 1113–1120 (2013)
Bindal N, Forbes SA, Beare D, Gunasekaran P, Leung K, Chai YK, et al. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Genome Biol 12, 1–25 (2011)
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012)
Wirapati P, Sotiriou C, Kunkel S, Farmer P, Pradervand S, Haibe-Kains B, et al. Meta-analysis of gene expression profifiles in breast cancer: toward a unifified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res 10, R65 (2008)
Tan IB, Ivanova T, Lim KH, Ong CW, Deng N, Lee J, et al. Intrinsic subtypes of gastric cancer, based on gene expression pattern, predict survival and respond differently to chemotherapy. Gastroenterology 141, 476–485 (2011)
Lee S, Lim H. Review of statistical methods for survival analysis using genomic data. Genomics Inform 17, e41 (2019)
Lynch CM, Abdollahi B, Fuqua JD, De AR, Bartholomai JA, Balgemann RN, et al. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 108, 1–8 (2017)
Wang P, Li Y, Reddy CK. Machine learning for survival analysis: a survey. ACM Comput Surv 51, 1–36 (2019).
Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol 34, 187–202 (1972).
Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39, 1–13 (2011)
Binder H, Schumacher M. Allowing for mandatory covariates in boosting estimation of sparse high dimensional survival models. BMC Bioinformatics 9, 14 (2008)
Zupan B, Demšar J, Kattan MW, Beck JR, Bratko I. Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif Intell Med 20, 59–75 (2000)
Hofner B, Hothorn T, Kneib T. Variable selection and model choice in structured survival models. Comput Stat 28, 1079–1101 (2013)
Chen Y, Jia Z, Mercola D, Xie X. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput Math Methods Med 2013, 873595 (2013)
Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high-dimensional data. Stat Anal Data Min 4, 115–132 (2011)
Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. Proc IEEE Int Conf Data Min 863–868 (2008)
Faraggi D, Simon R. A neural network model for survival data. Stat Med 14, 73–82 (1995)
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 521, 436–444 (2015)
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35, 1285–1298 (2016)
Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat Mach Intell 1, 191–198 (2019)
Hou X, Wang K, Zhong C, Wei Z. St-trader: A spatial-temporal deep neural network for modeling stock market movement. IEEE/CAA J Autom Sinica 8, 1015–1024 (2021)
Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 18, 851–869 (2016)
Ching T, Zhu X, Garmire LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol 14, 1–18 (2018)
Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput 23, 80–91 (2018)
Kim S, Kim K, Choe J, Lee I, Kang J. Improved survival analysis by learning shared genomic information from pan-cancer data. Bioinformatics 36, i389–i398 (2020)
Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T. DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep 9, 11399 (2019)
Lyu B, Haque A. Deep learning based tumor type classification using gene expression data. Proc 2018 ACM Int Conf on Bioinformatics, Computational Biology and Health Informatics 89–96 (2018)
Ma S, Zhang Z. OmicsMapNet: transforming omics data to take advantage of deep convolutional neural network for discovery. CoRR abs/1804.05283 (2018)
Lopez-Garcia G, Jerez JM, Franco L, Veredas FJ. Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS ONE 15, e0230536 (2020)
Shin B, Park S, Hong JH, An HJ, Chun SH, Kang K, et al. Cascaded Wx: a novel prognosis-related feature selection framework in human lung adenocarcinoma transcriptomes. Front Genet 10, 1–9 (2019)
Goldman M, Craft B, Brooks AN, Zhu J, Haussler D. The ucsc xena platform for cancer genomics data visualization and interpretation. https://doi.org/10.1101/326470 (2018)
Mostavi M, Chiu YC, Huang Y, Chen Y. Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics 13, 44 (2020)
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24, 1248–1259 (2017)
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Med Res Methodol 18, 1–12 (2018)
Demiar J, Schuurmans D. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7, 1–30 (2006)
Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Cox-PASNet: pathway-based sparse deep neural network for survival analysis. IEEE Int Conf Bioinformatics and Biomedicine 381–386 (2018)
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545–15550 (2005)
Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1, 417–425 (2015)
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39, 561 (2011)
Tuan NM, Lee CH. Role of Anillin in tumour: from a prognostic biomarker to a novel target. Cancers (Basel) 12, 1600 (2020)
Wang MY, Chen DP, Qi B, Li MY, Zhu YY, Yin WJ, et al. Pseudogene RACGAP1P activates RACGAP1/Rho/ERK signalling axis as a competing endogenous RNA to promote hepatocellular carcinoma early recurrence. Cell Death Dis 10, 426 (2019)
Hou G, Dong C, Dong Z, Liu G, Xu H, Chen L, et al. Upregulate KIF4A enhances proliferation, invasion of hepatocellular carcinoma and indicates poor prognosis across human cancer types. Sci Rep 7, 41–48 (2017)
Kawai Y, Shibata K, Sakata J, Suzuki S, Utsumi F, Niimi K, et al. KIF20A expression as a prognostic indicator and its possible involvement in the proliferation of ovarian clearcell carcinoma cells. Oncol Rep 40, 195–205 (2018)
Zhang L, Zhu G, Wang X, Liao X, Huang R, Huang C, et al. Genomewide investigation of the clinical significance and prospective molecular mechanisms of kinesin family member genes in patients with lung adenocarcinoma. Oncol Rep 42, 1017–1034 (2019)
Chen Y, Jin L, Jiang Z, Liu S, Feng W. Identifying and validating potential biomarkers of early stage lung adenocarcinoma diagnosis and prognosis. Front Oncol 11, 644426 (2021)
Shi YX, Zhu T, Zou T, Zhuo W, Chen YX, Huang MS, et al. Prognostic and predictive values of CDK1 and MAD2L1 in lung adenocarcinoma. Oncotarget 7, 85235–85243 (2016)
Chen J, Chen H, Yang H, Dai H. SPC25 upregulation increases cancer stem cell properties in non-small cell lung adenocarcinoma cells and independently predicts poor survival. Biomed Pharmacother 100, 233–239 (2018)