Practical use case of natural language processing for observational clinical research data retrieval from electronic health records: AssistMED project.
Journal
Polish archives of internal medicine
ISSN: 1897-9483
Titre abrégé: Pol Arch Intern Med
Pays: Poland
ID NLM: 101700960
Informations de publication
Date de publication:
19 Mar 2024
19 Mar 2024
Historique:
medline:
19
3
2024
pubmed:
19
3
2024
entrez:
19
3
2024
Statut:
aheadofprint
Résumé
Electronic health records (EHR) contain data valuable for clinical research but in textual format, requiring encoding to databases by a human- a lengthy and costly process. Natural language processing (NLP) is a computational technique that allows text analysis. To demonstrate a practical use case of NLP for a large retrospective study cohort characterization and compare it to a human retrieval. Anonymized discharge documentation of 10314 patients from the cardiology tertiary care department was analyzed for inclusion in the CRAFT registry (NCT02987062) of patients with atrial fibrillation (AF). Extensive clinical characteristics regarding concomitant diseases, medications, daily dosage and echocardiography were collected manually and through NLP. There were 3030 and 3029 patients identified by human and NLP-based approaches, respectively, reflecting 99.93% accuracy of NLP in detecting AF. Comprehensive baseline patient characteristics by NLP was faster than human analysis (3 hours and 15 minutes vs 71 hours and 12 minutes). The calculated CHA2DS2VASc and HAS-BLED scores based on both methods did not differ (human vs NLP; median, IQR, P value): 3 (2-5) vs 3 (2-5) P = 0.74 and 1 (1-2) vs 1 (1-2) P = 0.63. For most data, an almost perfect agreement between NLP and human retrieved characteristics was found; daily dosage identification was the least accurate NLP feature. Similar conclusions on cohort characteristics would be made; however, daily dosage detection for some drug groups would require additional human validation in the NLP-based cohort. NLP utilization on EHR may accelerate acquisition and provide accurate data for a retrospective study.
Identifiants
pubmed: 38501989
doi: 10.20452/pamw.16704
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM