Statistical models versus machine learning for competing risks: development and validation of prognostic models.
Artificial neural networks
Competing risks
Predictive performance
Random survival forests
Regression models
Supervised machine learning
Survival analysis
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
24 02 2023
24 02 2023
Historique:
received:
15
09
2022
accepted:
13
02
2023
entrez:
24
2
2023
pubmed:
25
2
2023
medline:
3
3
2023
Statut:
epublish
Résumé
In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting). A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years. Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated. Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
Sections du résumé
BACKGROUND
In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting).
METHODS
A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years.
RESULTS
Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated.
CONCLUSIONS
Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
Identifiants
pubmed: 36829145
doi: 10.1186/s12874-023-01866-z
pii: 10.1186/s12874-023-01866-z
pmc: PMC9951458
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
51Informations de copyright
© 2023. The Author(s).
Références
Cancer. 2005 Jan 15;103(2):402-8
pubmed: 15578681
Lancet. 2019 Apr 20;393(10181):1577-1579
pubmed: 31007185
BMC Med. 2015 Jan 06;13:1
pubmed: 25563062
Bioinformatics. 2012 Jan 1;28(1):112-8
pubmed: 22039212
Biostatistics. 2014 Oct;15(4):757-73
pubmed: 24728979
BMJ. 2020 Mar 18;368:m441
pubmed: 32188600
Stat Med. 2007 May 20;26(11):2389-430
pubmed: 17031868
J Surg Oncol. 2021 Mar;123(4):1050-1056
pubmed: 33332599
Epidemiology. 2009 Jul;20(4):555-61
pubmed: 19367167
Artif Intell Med. 2003 May;28(1):1-25
pubmed: 12850311
Med Care. 2010 Jun;48(6 Suppl):S96-105
pubmed: 20473207
Comput Math Methods Med. 2021 Nov 28;2021:2160322
pubmed: 34880930
Stat Med. 2013 May 30;32(12):2031-47
pubmed: 23086627
Circulation. 2016 Feb 9;133(6):601-9
pubmed: 26858290
BMC Med Res Methodol. 2020 Nov 16;20(1):277
pubmed: 33198650
PLoS One. 2019 Feb 19;14(2):e0212356
pubmed: 30779785
BMC Med Res Methodol. 2019 Mar 19;19(1):64
pubmed: 30890124
Lancet Oncol. 2012 Oct;13(10):1045-54
pubmed: 22954508
Stat Med. 2013 Dec 30;32(30):5381-97
pubmed: 24027076
Eur J Cancer. 2017 Sep;83:313-323
pubmed: 28797949
Rep Pract Oncol Radiother. 2019 Nov-Dec;24(6):511-519
pubmed: 31516397
IEEE Trans Neural Netw. 2009 Sep;20(9):1403-16
pubmed: 19628458
Stat Med. 1998 May 30;17(10):1169-86
pubmed: 9618776
Comput Struct Biotechnol J. 2014 Nov 15;13:8-17
pubmed: 25750696
Stat Med. 1999 Sep 15-30;18(17-18):2529-45
pubmed: 10474158
Eur J Cancer. 2018 Dec;105:19-27
pubmed: 30384013
Artif Intell Med. 2006 Jun;37(2):119-30
pubmed: 16730963
BMC Med Res Methodol. 2014 Dec 22;14:137
pubmed: 25532820
Biostatistics. 2019 Apr 1;20(2):347-357
pubmed: 29462286
IEEE J Biomed Health Inform. 2021 Aug;25(8):3163-3175
pubmed: 33460387
Ann Transl Med. 2018 Aug;6(16):325
pubmed: 30364028
Stat Med. 2012 May 20;31(11-12):1089-97
pubmed: 21953401
Ann Transl Med. 2017 Feb;5(3):47
pubmed: 28251126
Biometrics. 2015 Mar;71(1):102-113
pubmed: 25311240
Stat Med. 2014 Aug 15;33(18):3191-203
pubmed: 24668611
Biostatistics. 2014 Jul;15(3):526-39
pubmed: 24493091
J Clin Epidemiol. 2021 Oct;138:60-72
pubmed: 34214626
Epidemiology. 2010 Jan;21(1):128-38
pubmed: 20010215
Biom J. 2011 Feb;53(1):88-112
pubmed: 21259311
Stat Med. 2017 Apr 15;36(8):1203-1209
pubmed: 28102550