Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.

Humans Bayes Theorem Natural Language Processing Electronic Health Records Machine Learning Radiology

De-identification Ensemble models Machine learning Natural language processing Protected health information (PHI) Reporting

Journal

Journal of digital imaging

ISSN: 1618-727X

Titre abrégé: J Digit Imaging

Pays: United States

ID NLM: 9100529

Informations de publication

Date de publication:
12 2022

Historique:

received: 29 03 2022

accepted: 07 06 2022

revised: 02 06 2022

pubmed: 18 6 2022

medline: 3 12 2022

entrez: 17 6 2022

Statut: ppublish

Résumé

Natural language processing (NLP) techniques for electronic health records have shown great potential to improve the quality of medical care. The text of radiology reports frequently constitutes a large fraction of EHR data, and can provide valuable information about patients' diagnoses, medical history, and imaging findings. The lack of a major public repository for radiological reports severely limits the development, testing, and application of new NLP tools. De-identification of protected health information (PHI) presents a major challenge to building such repositories, as many automated tools for de-identification were trained or designed for clinical notes and do not perform sufficiently well to build a public database of radiology reports. We developed and evaluated six ensemble models based on three publically available de-identification tools: MIT de-id, NeuroNER, and Philter. A set of 1023 reports was set aside as the testing partition. Two individuals with medical training annotated the test set for PHI; differences were resolved by consensus. Ensemble methods included simple voting schemes (1-Vote, 2-Votes, and 3-Votes), a decision tree, a naïve Bayesian classifier, and Adaboost boosting. The 1-Vote ensemble achieved recall of 998 / 1043 (95.7%); the 3-Votes ensemble had precision of 1035 / 1043 (99.2%). F1 scores were: 93.4% for the decision tree, 71.2% for the naïve Bayesian classifier, and 87.5% for the boosting method. Basic voting algorithms and machine learning classifiers incorporating the predictions of multiple tools can outperform each tool acting alone in de-identifying radiology reports. Ensemble methods hold substantial potential to improve automated de-identification tools for radiology reports to make such reports more available for research use to improve patient care and outcomes.

Identifiants

DOI: 10.1007/s10278-022-00673-0 PMID: 35715655 PMC: PMC9712864

pubmed: 35715655

doi: 10.1007/s10278-022-00673-0

pii: 10.1007/s10278-022-00673-0

pmc: PMC9712864

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1694-1698

Subventions

Organisme : NIBIB NIH HHS

ID : T32 EB009384

Pays : United States

Informations de copyright

Références

J Am Med Inform Assoc. 2013 Jan 1;20(1):77-83

pubmed: 22947391

J Digit Imaging. 2013 Dec;26(6):1045-57

pubmed: 23884657

Neuroimaging Clin N Am. 2020 Nov;30(4):447-458

pubmed: 33038995

AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079

pubmed: 29854175

J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470

pubmed: 31794016

Radiology. 2016 May;279(2):329-43

pubmed: 27089187

BMC Med Inform Decis Mak. 2008 Jul 24;8:32

pubmed: 18652655

J Am Med Inform Assoc. 2017 May 01;24(3):596-606

pubmed: 28040687

Radiol Artif Intell. 2020 Oct 14;2(6):e190137

pubmed: 33937843

NPJ Digit Med. 2020 Apr 14;3:57

pubmed: 32337372

JMIR Med Inform. 2019 Apr 27;7(2):e12239

pubmed: 31066697

Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Hannah Horng (H)

Jackson Steinkamp (J)

Charles E Kahn (CE)

Tessa S Cook (TS)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH