Automated grouping of medical codes via multiview banded spectral clustering.

Data-driven grouping Electronic health records (EHR) International Classification of Disease (ICD) Multiple data sources Spectral clustering

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
12 2019
Historique:
received: 01 04 2019
revised: 25 10 2019
accepted: 27 10 2019
pubmed: 2 11 2019
medline: 21 10 2020
entrez: 2 11 2019
Statut: ppublish

Résumé

With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner. We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix. We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.

Identifiants

pubmed: 31672532
pii: S1532-0464(19)30241-2
doi: 10.1016/j.jbi.2019.103322
pmc: PMC7261410
mid: NIHMS1585756
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

103322

Subventions

Organisme : NIAMS NIH HHS
ID : L30 AR070514
Pays : United States
Organisme : NIAMS NIH HHS
ID : P30 AR072577
Pays : United States
Organisme : NIAMS NIH HHS
ID : T32 AR055885
Pays : United States

Informations de copyright

Copyright © 2019. Published by Elsevier Inc.

Références

Nat Biotechnol. 2013 Dec;31(12):1102-10
pubmed: 24270849
Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7
pubmed: 20235204
Stroke. 2016 Jul;47(7):1946-52
pubmed: 27174527
J Acad Nutr Diet. 2019 Mar;119(3):375-393
pubmed: 29685825
Sci Transl Med. 2011 Apr 20;3(79):79re1
pubmed: 21508311
J Biomed Inform. 2014 Dec;52:199-211
pubmed: 25038555
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50
pubmed: 27570647
BMC Med Inform Decis Mak. 2017 Jul 3;17(1):95
pubmed: 28673289
J Biomed Inform. 2015 Dec;58:156-165
pubmed: 26464024
J Clin Epidemiol. 2016 Feb;70:214-23
pubmed: 26441289
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
Sci Data. 2014 Sep 16;1:140032
pubmed: 25977789
Annu Rev Genomics Hum Genet. 2016 Aug 31;17:353-73
pubmed: 27147087
Age Ageing. 2016 Jul;45(4):511-7
pubmed: 27103599

Auteurs

Luwan Zhang (L)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. Electronic address: lzhang@hsph.harvard.edu.

Yichi Zhang (Y)

Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI, USA.

Tianrun Cai (T)

Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.

Yuri Ahuja (Y)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Zeling He (Z)

Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.

Yuk-Lam Ho (YL)

Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.

Andrew Beam (A)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Kelly Cho (K)

Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.

Robert Carroll (R)

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

Joshua Denny (J)

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

Isaac Kohane (I)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Katherine Liao (K)

Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Tianxi Cai (T)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH