Sparse multi-output Gaussian processes for online medical time series prediction.

Electronic health records Gaussian processes Sparse time series Spectral mixture kernel

Journal

BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682

Informations de publication

Date de publication:
08 07 2020
Historique:
received: 09 11 2018
accepted: 05 03 2020
entrez: 10 7 2020
pubmed: 10 7 2020
medline: 5 1 2021
Statut: epublish

Résumé

For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

Sections du résumé

BACKGROUND
For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring.
METHODS
We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates.
RESULTS
We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals.
CONCLUSIONS
The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

Identifiants

pubmed: 32641134
doi: 10.1186/s12911-020-1069-4
pii: 10.1186/s12911-020-1069-4
pmc: PMC7341595
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

152

Subventions

Organisme : NIMH NIH HHS
ID : R01 MH101822
Pays : United States
Organisme : National Science Foundation
ID : AWD1005627
Pays : International
Organisme : NHLBI NIH HHS
ID : R01 HL133218
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG007900
Pays : United States

Références

Crit Care Med. 2001 Jul;29(7):1303-10
pubmed: 11445675
IEEE J Biomed Health Inform. 2015 May;19(3):1068-76
pubmed: 25014976
Stat Med. 2011 May 30;30(12):1366-80
pubmed: 21337596
N Engl J Med. 2003 Jan 9;348(2):138-50
pubmed: 12519925
Crit Care. 2010;14(1):R15
pubmed: 20144219
Crit Care. 2015 Mar 16;19:118
pubmed: 25886756
IEEE Trans Biomed Eng. 2008 Sep;55(9):2143-51
pubmed: 18713683
JAMA. 2013 Apr 3;309(13):1351-2
pubmed: 23549579
PLoS One. 2013 Jun 24;8(6):e66341
pubmed: 23826094
Adv Neural Inf Process Syst. 2011;24:523-531
pubmed: 25364213
Chest. 2011 Nov;140(5):1223-1231
pubmed: 21852297
Sci Data. 2016 May 24;3:160035
pubmed: 27219127
JAMA. 2015 Sep 1;314(9):940-1
pubmed: 26325562
Conf Proc IEEE Eng Med Biol Soc. 2012;2012:6526-9
pubmed: 23367424
J Am Med Inform Assoc. 2013 Jan 1;20(1):117-21
pubmed: 22955496
IEEE Trans Biomed Eng. 2015 Jan;62(1):314-22
pubmed: 25167541
Proc Conf AAAI Artif Intell. 2015 Jan;2015:446-453
pubmed: 27182460
Sci Transl Med. 2015 Aug 5;7(299):299ra122
pubmed: 26246167
Philos Trans A Math Phys Eng Sci. 2012 Dec 31;371(1984):20110550
pubmed: 23277607
Stat Med. 2002 Sep 30;21(18):2685-701
pubmed: 12228885

Auteurs

Li-Fang Cheng (LF)

Department of Electrical Engineering, Princeton University, Princeton, USA.

Bianca Dumitrascu (B)

Lewis-Sigler Institute, Princeton University, Princeton, NJ, USA.

Gregory Darnell (G)

Lewis-Sigler Institute, Princeton University, Princeton, NJ, USA.

Corey Chivers (C)

University of Pennsylvania Health System, Philadelphia, PA, USA.

Michael Draugelis (M)

University of Pennsylvania Health System, Philadelphia, PA, USA.

Kai Li (K)

Department of Computer Science, Princeton University, Princeton, NJ, USA.

Barbara E Engelhardt (BE)

Department of Computer Science, Princeton University, Princeton, NJ, USA. bee@princeton.edu.
Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, USA. bee@princeton.edu.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation
Humans Algorithms Software Artificial Intelligence Computer Simulation

Classifications MeSH