Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology.
Journal
JCO clinical cancer informatics
ISSN: 2473-4276
Titre abrégé: JCO Clin Cancer Inform
Pays: United States
ID NLM: 101708809
Informations de publication
Date de publication:
09 2021
09 2021
Historique:
entrez:
30
9
2021
pubmed:
1
10
2021
medline:
3
11
2021
Statut:
ppublish
Résumé
Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.
Identifiants
pubmed: 34591602
doi: 10.1200/CCI.21.00077
pmc: PMC8812620
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1015-1023Références
J Clin Oncol. 2017 Aug 1;35(22):2551-2557
pubmed: 28574777
J R Soc Interface. 2018 Oct 31;15(147):
pubmed: 30381346
BMC Med Res Methodol. 2020 Nov 16;20(1):277
pubmed: 33198650
Health Aff (Millwood). 2014 Jul;33(7):1139-47
pubmed: 25006139
PLoS One. 2018 Aug 31;13(8):e0202344
pubmed: 30169498
J Clin Epidemiol. 2001 Aug;54(8):774-81
pubmed: 11470385
J Pain Symptom Manage. 2018 Apr;55(4):1068-1076
pubmed: 29289656
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Br J Cancer. 2014 Jan 7;110(1):42-8
pubmed: 24253502
J Am Med Inform Assoc. 2020 Mar 1;27(3):437-443
pubmed: 31951005
CA Cancer J Clin. 2009 Jul-Aug;59(4):250-63
pubmed: 19535791
JAMA Oncol. 2020 Dec 01;6(12):e204759
pubmed: 33057696
J Am Med Inform Assoc. 2015 Jul;22(4):872-80
pubmed: 25896647
JAMA Oncol. 2020 Nov 01;6(11):1723-1730
pubmed: 32970131
J Clin Oncol. 2011 Oct 10;29(29):3927-31
pubmed: 21911715
J Am Med Inform Assoc. 2020 Jul 1;27(9):1393-1400
pubmed: 32638010
Arch Intern Med. 2009 Mar 9;169(5):480-8
pubmed: 19273778
Cancer. 2014 Jan 15;120(2):278-85
pubmed: 24122784
J Clin Epidemiol. 2020 Jun;122:95-107
pubmed: 32201256
JAMA. 2018 Jul 3;320(1):27-28
pubmed: 29813156
JCO Clin Cancer Inform. 2018 Dec;2:1-11
pubmed: 30652575
JAMA Netw Open. 2018 Jul 6;1(3):e180926
pubmed: 30646043
J Clin Oncol. 1991 Sep;9(9):1618-26
pubmed: 1651993
JAMA Netw Open. 2019 Oct 2;2(10):e1915997
pubmed: 31651973
Epidemiology. 2010 Jan;21(1):128-38
pubmed: 20010215
Br J Cancer. 2003 Sep 15;89(6):1022-7
pubmed: 12966419
J Am Med Inform Assoc. 2013 Jul-Aug;20(4):603-12
pubmed: 23444013
Int J Hematol. 2007 Nov;86(4):337-42
pubmed: 18055341