MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.


Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
10 2022
Historique:
received: 06 02 2022
revised: 27 08 2022
accepted: 28 08 2022
pubmed: 5 9 2022
medline: 14 10 2022
entrez: 4 9 2022
Statut: ppublish

Résumé

Electronic Health Records (EHRs) contain rich clinical data collected at the point of the care, and their increasing adoption offers exciting opportunities for clinical informatics, disease risk prediction, and personalized treatment recommendation. However, effective use of EHR data for research and clinical decision support is often hampered by a lack of reliable disease labels. To compile gold-standard labels, researchers often rely on clinical experts to develop rule-based phenotyping algorithms from billing codes and other surrogate features. This process is tedious and error-prone due to recall and observer biases in how codes and measures are selected, and some phenotypes are incompletely captured by a handful of surrogate features. To address this challenge, we present a novel automatic phenotyping model called MixEHR-Guided (MixEHR-G), a multimodal hierarchical Bayesian topic model that efficiently models the EHR generative process by identifying latent phenotype structure in the data. Unlike existing topic modeling algorithms wherein the inferred topics are not identifiable, MixEHR-G uses prior information from informative surrogate features to align topics with known phenotypes. We applied MixEHR-G to an openly-available EHR dataset of 38,597 intensive care patients (MIMIC-III) in Boston, USA and to administrative claims data for a population-based cohort (PopHR) of 1.3 million people in Quebec, Canada. Qualitatively, we demonstrate that MixEHR-G learns interpretable phenotypes and yields meaningful insights about phenotype similarities, comorbidities, and epidemiological associations. Quantitatively, MixEHR-G outperforms existing unsupervised phenotyping methods on a phenotype label annotation task, and it can accurately estimate relative phenotype prevalence functions without gold-standard phenotype information. Altogether, MixEHR-G is an important step towards building an interpretable and automated phenotyping system using EHR data.

Identifiants

pubmed: 36058522
pii: S1532-0464(22)00197-6
doi: 10.1016/j.jbi.2022.104190
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

104190

Informations de copyright

Copyright © 2022 The Author(s). Published by Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Yuri Ahuja (Y)

Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA; Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA. Electronic address: yuri_ahuja@hms.harvard.edu.

Yuesong Zou (Y)

School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada.

Aman Verma (A)

School of Population and Global Health, McGill University, 2001 McGill College Avenue, Montreal, Québec H3A 1G1, Canada.

David Buckeridge (D)

School of Population and Global Health, McGill University, 2001 McGill College Avenue, Montreal, Québec H3A 1G1, Canada. Electronic address: david.buckeridge@mcgill.ca.

Yue Li (Y)

School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada. Electronic address: yueli@cs.mcgill.ca.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Primary Health Care Electronic Health Records Humans Tanzania Surveys and Questionnaires
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Patient Reported Outcome Measures Neoplasms Electronic Health Records Delivery of Health Care

Classifications MeSH