Survival analysis for lung cancer patients: A comparison of Cox regression and machine learning models.
Data science
Epidemiology
Explainable AI
Lung cancer
Survival analysis
Journal
International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057
Informations de publication
Date de publication:
26 Aug 2024
26 Aug 2024
Historique:
received:
22
03
2024
revised:
12
07
2024
accepted:
21
08
2024
medline:
31
8
2024
pubmed:
31
8
2024
entrez:
29
8
2024
Statut:
aheadofprint
Résumé
Survival analysis based on cancer registry data is of paramount importance for monitoring the effectiveness of health care. As new methods arise, the compendium of statistical tools applicable to cancer registry data grows. In recent years, machine learning approaches for survival analysis were developed. The aim of this study is to compare the model performance of the well established Cox regression and novel machine learning approaches on a previously unused dataset. The study is based on lung cancer data from the Schleswig-Holstein Cancer Registry. Four survival analysis models are compared: Cox Proportional Hazard Regression (CoxPH) as the most commonly used statistical model, as well as Random Survival Forests (RSF) and two neural network architectures based on the DeepSurv and TabNet approaches. The models are evaluated using the concordance index (C-I), the Brier score and the AUC-ROC score. In addition, to gain more insight in the decision process of the models, we identified the features that have an higher impact on patient survival using permutation feature importance scores and SHAP values. Using a dataset including the cancer stage established by the Union for International Cancer Control (UICC), the best performing model is the CoxPH (C-I: 0.698±0.005), while using a dataset which includes the tumor size, lymph node and metastasis status (TNM) leads to the RSF as best performing model (C-I: 0.703±0.004). The explainability metrics show that the models rely on the combined UICC stage and the metastasis status in the first place, which corresponds to other studies. The studied methods are highly relevant for epidemiological researchers to create more accurate survival models, which can help physicians make informed decisions about appropriate therapies and management of patients with lung cancer, ultimately improving survival and quality of life.
Identifiants
pubmed: 39208536
pii: S1386-5056(24)00270-3
doi: 10.1016/j.ijmedinf.2024.105607
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
105607Informations de copyright
Copyright © 2024. Published by Elsevier B.V.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.