Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study.
algorithms
data analysis
electronic health records
machine learning
ophthalmology
Journal
JMIR formative research
ISSN: 2561-326X
Titre abrégé: JMIR Form Res
Pays: Canada
ID NLM: 101726394
Informations de publication
Date de publication:
17 Dec 2020
17 Dec 2020
Historique:
received:
22
09
2020
accepted:
17
11
2020
revised:
12
11
2020
entrez:
17
12
2020
pubmed:
18
12
2020
medline:
18
12
2020
Statut:
epublish
Résumé
One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it. In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets. We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients' medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients' past medical history and contained records of 10,000 distinct patients. We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine's accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%. We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.
Sections du résumé
BACKGROUND
BACKGROUND
One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it.
OBJECTIVE
OBJECTIVE
In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets.
METHODS
METHODS
We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients' medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients' past medical history and contained records of 10,000 distinct patients.
RESULTS
RESULTS
We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine's accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%.
CONCLUSIONS
CONCLUSIONS
We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.
Identifiants
pubmed: 33331823
pii: v4i12e24490
doi: 10.2196/24490
pmc: PMC7775202
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e24490Informations de copyright
©Gumpili Sai Prashanthi, Ayush Deva, Ranganath Vadapalli, Anthony Vipin Das. Originally published in JMIR Formative Research (http://formative.jmir.org), 17.12.2020.
Références
J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379
pubmed: 30726935
JMIR Med Inform. 2019 Mar 26;7(1):e13039
pubmed: 30862607
J Biomed Inform. 2015 Apr;54:186-90
pubmed: 25746391
Biomed Res Int. 2016;2016:6215745
pubmed: 27051666
J Healthc Eng. 2018 Apr 8;2018:4302425
pubmed: 29849998
J Med Internet Res. 2018 Jun 29;20(6):e231
pubmed: 29959110
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7
pubmed: 24578357
JMIR Med Inform. 2016 Nov 11;4(4):e37
pubmed: 27836816
BMC Med Inform Decis Mak. 2006 Jul 26;6:30
pubmed: 16872495
AMIA Annu Symp Proc. 2008 Nov 06;:81-5
pubmed: 18998970
JMIR Med Inform. 2019 Nov 1;7(4):e12575
pubmed: 31682579
JMIR Med Inform. 2016 Oct 28;4(4):e35
pubmed: 27793791
Arch Phys Med Rehabil. 2020 Oct;101(10):1739-1746
pubmed: 32446905
Indian J Ophthalmol. 2020 Mar;68(3):427-432
pubmed: 32056994
Healthc Inform Res. 2019 Jan;25(1):1-2
pubmed: 30788175
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15
pubmed: 26911811
J Biomed Inform. 2003 Jun;36(3):145-58
pubmed: 14615225