Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation.

Acute Kidney Injury / epidemiology Adolescent Adult Aged Aged, 80 and over Betacoronavirus COVID-19 Cohort Studies Coronavirus Infections / diagnosis Electronic Health Records Female Hospital Mortality Hospitalization / statistics & numerical data Hospitals Humans Machine Learning / standards Male Middle Aged New York City / epidemiology Pandemics Pneumonia, Viral / diagnosis Prognosis ROC Curve Risk Assessment / methods SARS-CoV-2 Young Adult

COVID-19 EHR TRIPOD clinical informatics cohort electronic health record hospital machine learning mortality performance prediction

Journal

Journal of medical Internet research

ISSN: 1438-8871

Titre abrégé: J Med Internet Res

Pays: Canada

ID NLM: 100959882

Informations de publication

Date de publication:
06 11 2020

Historique:

received: 01 09 2020

accepted: 02 10 2020

revised: 02 10 2020

pubmed: 8 10 2020

medline: 25 11 2020

entrez: 7 10 2020

Statut: epublish

Résumé

COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.

Sections du résumé

BACKGROUND

OBJECTIVE

The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points.

METHODS

We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions.

RESULTS

Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction.

CONCLUSIONS

We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.

Identifiants

DOI: 10.2196/24018 PMID: 33027032 PMC: PMC7652593

pubmed: 33027032

pii: v22i11e24018

doi: 10.2196/24018

pmc: PMC7652593

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Validation Study

Langues

eng

Sous-ensembles de citation

Pagination

e24018

Subventions

Organisme : NCATS NIH HHS

ID : UL1 TR001433

Pays : United States

Informations de copyright

©Akhil Vaid, Sulaiman Somani, Adam J Russak, Jessica K De Freitas, Fayzan F Chaudhry, Ishan Paranjpe, Kipp W Johnson, Samuel J Lee, Riccardo Miotto, Felix Richter, Shan Zhao, Noam D Beckmann, Nidhi Naik, Arash Kia, Prem Timsina, Anuradha Lala, Manish Paranjpe, Eddye Golden, Matteo Danieletto, Manbir Singh, Dara Meyer, Paul F O'Reilly, Laura Huckins, Patricia Kovatch, Joseph Finkelstein, Robert M. Freeman, Edgar Argulian, Andrew Kasarskis, Bethany Percha, Judith A Aberg, Emilia Bagiella, Carol R Horowitz, Barbara Murphy, Eric J Nestler, Eric E Schadt, Judy H Cho, Carlos Cordon-Cardo, Valentin Fuster, Dennis S Charney, David L Reich, Erwin P Bottinger, Matthew A Levin, Jagat Narula, Zahi A Fayad, Allan C Just, Alexander W Charney, Girish N Nadkarni, Benjamin S Glicksberg. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2020.

Références

J Am Coll Cardiol. 2020 Aug 4;76(5):533-546

pubmed: 32517963

Clin Infect Dis. 2020 Nov 19;71(16):2079-2088

pubmed: 32361723

N Engl J Med. 2020 Apr 30;382(18):1708-1720

pubmed: 32109013

Clin Infect Dis. 2020 Dec 31;71(11):2933-2938

pubmed: 32594164

Intensive Care Med. 2020 May;46(5):846-848

pubmed: 32125452

J Transl Med. 2020 May 20;18(1):206

pubmed: 32434518

BMJ. 2020 Apr 7;369:m1328

pubmed: 32265220

Nat Commun. 2020 Jul 15;11(1):3543

pubmed: 32669540

JAMA. 2020 May 12;323(18):1775-1776

pubmed: 32203977

BMC Bioinformatics. 2007 Jan 25;8:25

pubmed: 17254353

Thromb Res. 2020 Jul;191:145-147

pubmed: 32291094

Lancet Respir Med. 2020 Apr;8(4):e21

pubmed: 32171062

Euro Surveill. 2020 Mar;25(10):

pubmed: 32183930

J Med Syst. 2020 Mar 18;44(5):93

pubmed: 32189081

Eur Respir J. 1996 Aug;9(8):1736-42

pubmed: 8866602

Lancet. 2020 May 9;395(10235):1517-1520

pubmed: 32311318

PLoS One. 2020 Dec 9;15(12):e0242953

pubmed: 33296357

J Cardiovasc Electrophysiol. 2020 May;31(5):1003-1008

pubmed: 32270559

Ann Clin Microbiol Antimicrob. 2020 May 15;19(1):18

pubmed: 32414383

Nat Mach Intell. 2020 Jan;2(1):56-67

pubmed: 32607472

Ann Endocrinol (Paris). 2020 Jun;81(2-3):101-109

pubmed: 32413342

BMC Med. 2015 Jan 06;13:1

pubmed: 25563062

J Am Soc Nephrol. 2021 Jan;32(1):151-160

pubmed: 32883700

Am J Hematol. 2020 Jul;95(7):834-847

pubmed: 32282949

PLoS One. 2020 May 18;15(5):e0233328

pubmed: 32421703

J Clin Virol. 2020 Jun;127:104370

pubmed: 32344321

Clin Infect Dis. 2020 Jul 28;71(15):833-840

pubmed: 32296824

Lancet Respir Med. 2020 Jul;8(7):738-742

pubmed: 32416769

JAMA. 2020 Apr 14;323(14):1335

pubmed: 32181795

Circ Cardiovasc Qual Outcomes. 2020 May;13(5):e006766

pubmed: 32298145

JAMA Cardiol. 2020 Jul 1;5(7):802-810

pubmed: 32211816

BMJ. 2020 May 22;369:m1966

pubmed: 32444366

Clin Chem Lab Med. 2020 Jun 25;58(7):1021-1028

pubmed: 32286245

Diabetes Metab J. 2020 Apr;44(2):349-353

pubmed: 32347027

J Med Internet Res. 2020 Oct 6;22(10):e21439

pubmed: 32976111

Sci China Life Sci. 2020 Mar;63(3):364-374

pubmed: 32048163

Lancet Digit Health. 2020 Oct;2(10):e516-e525

pubmed: 32984797

Nat Med. 2020 Jul;26(7):1037-1040

pubmed: 32393804

J Med Virol. 2020 Oct;92(10):1902-1914

pubmed: 32293716

N Engl J Med. 2020 Apr 23;382(17):e38

pubmed: 32268022

J Clin Med. 2020 Jun 03;9(6):

pubmed: 32503180