Empirical Findings on the Role of Structured Data, Unstructured Data, and their Combination for Automatic Clinical Phenotyping.
Journal
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
ISSN: 2153-4063
Titre abrégé: AMIA Jt Summits Transl Sci Proc
Pays: United States
ID NLM: 101539486
Informations de publication
Date de publication:
Historique:
entrez:
30
8
2021
pubmed:
31
8
2021
medline:
11
9
2021
Statut:
epublish
Résumé
The objective of this study is to explore the role of structured and unstructured data for clinical phenotyping by determining which types of clinical phenotypes are best identified using unstructured data (e.g., clinical notes), structured data (e.g., laboratory values, vital signs), or their combination across 172 clinical phenotypes. Specifically, we used laboratory and chart measurements as well as clinical notes from the MIMIC-III critical care database and trained an LSTM using features extracted from each type of data to determine which categories of phenotypes were best identified by structured data, unstructured data, or both. We observed that textual features on their own outperformed structured features for 145 (84%) of phenotypes, and that Doc2Vec was the most effective representation of unstructured data for all phenotypes. When evaluating the impact of adding textual features to systems previously relying only on structured features, we found a statistically significant (p < 0.05) increase in phenotyping performance for 51 phenotypes (primarily involving the circulatory system, injury, and poisoning), one phenotype for which textual features degraded performance (diabetes without complications), and no statistically significant change in performance with the remaining 120 phenotypes. We provide analysis on which phenotypes are best identified by each type of data and guidance on which data sources to consider for future research on phenotype identification.
Types de publication
Journal Article
Research Support, N.I.H., Intramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
445-454Informations de copyright
©2021 AMIA - All rights reserved.
Références
Arch Intern Med. 2011 May 23;171(10):897-903
pubmed: 21263077
Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7
pubmed: 20235204
J Am Med Inform Assoc. 2017 Jul 1;24(4):841-844
pubmed: 28130331
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
AMIA Annu Symp Proc. 2020 Mar 04;2019:323-332
pubmed: 32308825
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7
pubmed: 26338219
Clin Chem. 2002 Mar;48(3):436-72
pubmed: 11861436
J Am Med Inform Assoc. 2016 Apr;23(e1):e11-9
pubmed: 26316458
PLoS One. 2018 Feb 15;13(2):e0192360
pubmed: 29447188
Annu Rev Public Health. 2019 Apr 1;40:487-500
pubmed: 30566385
Diabetes Res Clin Pract. 2016 Nov;121:192-203
pubmed: 27744128
Stud Health Technol Inform. 2019 Aug 21;264:368-372
pubmed: 31437947
J Am Med Inform Assoc. 2018 Oct 1;25(10):1359-1365
pubmed: 29788308
PLoS Comput Biol. 2012;8(12):e1002823
pubmed: 23300414
NPJ Digit Med. 2019 Sep 6;2:88
pubmed: 31508498
J Am Med Inform Assoc. 2013 Dec;20(e2):e306-10
pubmed: 23956016
Summit Transl Bioinform. 2009 Mar 01;2009:116-20
pubmed: 21347182
Sci Data. 2016 May 24;3:160035
pubmed: 27219127
J Am Med Inform Assoc. 2011 Mar-Apr;18(2):181-6
pubmed: 21233086
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Sci Data. 2019 Jun 17;6(1):96
pubmed: 31209213