Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods.

COVID-19 mortality Classification trees Extreme gradient boosting Generalized additive model Predictive model

Journal

BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545

Informations de publication

Date de publication:
27 11 2021
Historique:
received: 23 04 2021
accepted: 14 10 2021
entrez: 28 11 2021
pubmed: 29 11 2021
medline: 15 12 2021
Statut: epublish

Résumé

Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system's burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier's score, calibration intercept and calibration slope. We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier's scores. XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

Sections du résumé

BACKGROUND
Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system's burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk.
METHODS
We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier's score, calibration intercept and calibration slope.
RESULTS
We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier's scores.
CONCLUSIONS
XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

Identifiants

pubmed: 34837951
doi: 10.1186/s12874-021-01441-4
pii: 10.1186/s12874-021-01441-4
pmc: PMC8627169
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

267

Informations de copyright

© 2021. The Author(s).

Références

J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
PeerJ. 2020 Sep 28;8:e10083
pubmed: 33062451
BMJ Open. 2021 Feb 17;11(2):e043863
pubmed: 33597143
BMJ Open. 2020 Feb 25;10(2):e033898
pubmed: 32102816
Stat Med. 2005 Oct 15;24(19):3019-35
pubmed: 16149128
JAMA Netw Open. 2020 Jun 1;3(6):e2011834
pubmed: 32525550
Epidemiology. 2010 Jan;21(1):128-38
pubmed: 20010215
Stat Med. 2007 Jul 10;26(15):2937-57
pubmed: 17186501
Sci Rep. 2021 Feb 18;11(1):4200
pubmed: 33603086
J Clin Epidemiol. 2010 Aug;63(8):938-9; author reply 939
pubmed: 20189763
Eur Respir J. 2020 May 14;55(5):
pubmed: 32217650
PLoS One. 2021 Feb 4;16(2):e0246306
pubmed: 33539390
J Clin Oncol. 2005 Jul 1;23(19):4322-9
pubmed: 15781880
Front Med (Lausanne). 2020 Aug 11;7:445
pubmed: 32903618

Auteurs

Cindy Feng (C)

Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, 5790 University Avenue, Halifax, B3H 1V7, NS, Canada. cindy.feng@dal.ca.

George Kephart (G)

Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, 5790 University Avenue, Halifax, B3H 1V7, NS, Canada.

Elizabeth Juarez-Colunga (E)

Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 80045 Aurora, Colorado, 80045, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH