Robust extraction of pneumonia-associated clinical states from electronic health records.
EHR mining
clustering
high dimensionality
multicenter integration
Journal
Proceedings of the National Academy of Sciences of the United States of America
ISSN: 1091-6490
Titre abrégé: Proc Natl Acad Sci U S A
Pays: United States
ID NLM: 7505876
Informations de publication
Date de publication:
05 Nov 2024
05 Nov 2024
Historique:
medline:
30
10
2024
pubmed:
30
10
2024
entrez:
30
10
2024
Statut:
ppublish
Résumé
Mining of electronic health records (EHR) promises to automate the identification of comprehensive disease phenotypes. However, the realization of this promise is hindered by the unavailability of generalizable ground-truth information, data incompleteness and heterogeneity, and the lack of generalization to multiple cohorts. We present here a data-driven approach to identify clinical states that we implement for 585 critical care patients with suspected pneumonia recruited by the SCRIPT study, which we compare to and integrate with 9,918 pneumonia patients from the MIMIC-IV dataset. We extract and curate from their structured EHRs a primary set of clinical features (53 and 59 features for SCRIPT and MIMIC-IV, respectively), including disease severity scores, vital signs, and so on, at various degrees of completeness. We aggregate irregular time series into daily frequency, resulting in 12,495 and 94,684 patient-day pairs for SCRIPT and MIMIC, respectively. We define a "common-sense" ground truth that we then use in a semisupervised pipeline to optimize choices for data preprocessing, and reduce the feature space to four principal components. We describe and validate an ensemble-based clustering method that enables us to robustly identify five clinical states, and use a Gaussian mixture model to quantify uncertainty in cluster assignment. Demonstrating the clinical relevance of the identified states, we find that three states are strongly associated with disease outcomes (dying vs. recovering), while the other two reflect disease etiology. The outcome associated clinical states provide significantly increased discrimination of mortality rates over standard approaches.
Identifiants
pubmed: 39475648
doi: 10.1073/pnas.2417688121
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e2417688121Subventions
Organisme : HHS | NIH | National Heart, Lung, and Blood Institute (NHLBI)
ID : R01HL140362
Organisme : HHS | NIH | NIAID | Division of Intramural Research (DIR, NIAID)
ID : U19AI135964
Organisme : HHS | NIH | National Institute of General Medical Sciences (NIGMS)
ID : T32GM153505
Déclaration de conflit d'intérêts
Competing interests statement:The authors declare no competing interest.