Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
26 03 2021
26 03 2021
Historique:
received:
12
10
2020
accepted:
15
03
2021
entrez:
27
3
2021
pubmed:
28
3
2021
medline:
16
10
2021
Statut:
epublish
Résumé
Cox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the [Formula: see text]-index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ([Formula: see text]-index [Formula: see text]), and in the case of XGB even better ([Formula: see text]-index [Formula: see text]). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models' predictions. We concluded that the difference in performance can be attributed to XGB's ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models' predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.
Identifiants
pubmed: 33772109
doi: 10.1038/s41598-021-86327-7
pii: 10.1038/s41598-021-86327-7
pmc: PMC7998037
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
6968Références
Nat Mach Intell. 2020 Jan;2(1):56-67
pubmed: 32607472
BMC Bioinformatics. 2008 Jan 10;9:14
pubmed: 18186927
BMC Med. 2010 Mar 30;8:21
pubmed: 20353579
Br J Cancer. 1995 Aug;72(2):511-8
pubmed: 7640241
Neural Netw. 2020 Dec;132:1-18
pubmed: 32841808
Nat Biomed Eng. 2018 Oct;2(10):749-760
pubmed: 31001455
Ann Surg. 2005 Feb;241(2):309-18
pubmed: 15650642
Cancer. 1989 Jan 1;63(1):181-7
pubmed: 2910416
Bioinformatics. 2008 Jul 15;24(14):1632-8
pubmed: 18515276
Biometrics. 2005 Mar;61(1):17-24
pubmed: 15737074
Lancet Oncol. 2019 May;20(5):e262-e273
pubmed: 31044724
Quant Biosci. 2017;36(2):85-96
pubmed: 30740388
JAMA. 1982 May 14;247(18):2543-6
pubmed: 7069920
Head Neck. 2012 Jan;34(1):50-8
pubmed: 21322080
Oncotarget. 2015 Sep 8;6(26):22985-95
pubmed: 26036636
Stat Med. 1996 Feb 28;15(4):361-87
pubmed: 8668867
Cancer Lett. 2020 Jul 1;481:55-62
pubmed: 32251707
Cancers (Basel). 2019 Aug 23;11(9):
pubmed: 31450799
Sci Rep. 2019 May 6;9(1):6994
pubmed: 31061433
Comput Struct Biotechnol J. 2014 Nov 15;13:8-17
pubmed: 25750696
Neural Comput. 2014 Apr;26(4):781-817
pubmed: 24479776
JCO Clin Cancer Inform. 2020 Mar;4:259-274
pubmed: 32213092
Stud Health Technol Inform. 2020 Jun 16;270:307-311
pubmed: 32570396
JCO Clin Cancer Inform. 2020 Jul;4:637-646
pubmed: 32673068
Stat Med. 1984 Apr-Jun;3(2):143-52
pubmed: 6463451
Bioinformatics. 2008 Oct 1;24(19):2200-8
pubmed: 18635567
J Am Heart Assoc. 2020 Nov 17;9(22):e016745
pubmed: 33140687
Artif Intell Med. 2010 Oct;50(2):105-15
pubmed: 20638252
Cancer Inform. 2007 Feb 11;2:59-77
pubmed: 19458758
Br J Cancer. 2011 Feb 15;104(4):693-9
pubmed: 21266980
BMC Med Res Methodol. 2018 Feb 26;18(1):24
pubmed: 29482517
Cancers (Basel). 2020 Mar 05;12(3):
pubmed: 32150991
Artif Intell Med. 2017 Mar;77:31-47
pubmed: 28545610
Artif Intell Med. 2019 Mar;94:42-53
pubmed: 30871682