Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.

De-identification Ensemble models Machine learning Natural language processing Protected health information (PHI) Reporting

Journal

Journal of digital imaging
ISSN: 1618-727X
Titre abrégé: J Digit Imaging
Pays: United States
ID NLM: 9100529

Informations de publication

Date de publication:
12 2022
Historique:
received: 29 03 2022
accepted: 07 06 2022
revised: 02 06 2022
pubmed: 18 6 2022
medline: 3 12 2022
entrez: 17 6 2022
Statut: ppublish

Résumé

Natural language processing (NLP) techniques for electronic health records have shown great potential to improve the quality of medical care. The text of radiology reports frequently constitutes a large fraction of EHR data, and can provide valuable information about patients' diagnoses, medical history, and imaging findings. The lack of a major public repository for radiological reports severely limits the development, testing, and application of new NLP tools. De-identification of protected health information (PHI) presents a major challenge to building such repositories, as many automated tools for de-identification were trained or designed for clinical notes and do not perform sufficiently well to build a public database of radiology reports. We developed and evaluated six ensemble models based on three publically available de-identification tools: MIT de-id, NeuroNER, and Philter. A set of 1023 reports was set aside as the testing partition. Two individuals with medical training annotated the test set for PHI; differences were resolved by consensus. Ensemble methods included simple voting schemes (1-Vote, 2-Votes, and 3-Votes), a decision tree, a naïve Bayesian classifier, and Adaboost boosting. The 1-Vote ensemble achieved recall of 998 / 1043 (95.7%); the 3-Votes ensemble had precision of 1035 / 1043 (99.2%). F1 scores were: 93.4% for the decision tree, 71.2% for the naïve Bayesian classifier, and 87.5% for the boosting method. Basic voting algorithms and machine learning classifiers incorporating the predictions of multiple tools can outperform each tool acting alone in de-identifying radiology reports. Ensemble methods hold substantial potential to improve automated de-identification tools for radiology reports to make such reports more available for research use to improve patient care and outcomes.

Identifiants

pubmed: 35715655
doi: 10.1007/s10278-022-00673-0
pii: 10.1007/s10278-022-00673-0
pmc: PMC9712864
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1694-1698

Subventions

Organisme : NIBIB NIH HHS
ID : T32 EB009384
Pays : United States

Informations de copyright

© 2022. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.

Références

J Am Med Inform Assoc. 2013 Jan 1;20(1):77-83
pubmed: 22947391
J Digit Imaging. 2013 Dec;26(6):1045-57
pubmed: 23884657
Neuroimaging Clin N Am. 2020 Nov;30(4):447-458
pubmed: 33038995
AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079
pubmed: 29854175
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470
pubmed: 31794016
Radiology. 2016 May;279(2):329-43
pubmed: 27089187
BMC Med Inform Decis Mak. 2008 Jul 24;8:32
pubmed: 18652655
J Am Med Inform Assoc. 2017 May 01;24(3):596-606
pubmed: 28040687
Radiol Artif Intell. 2020 Oct 14;2(6):e190137
pubmed: 33937843
NPJ Digit Med. 2020 Apr 14;3:57
pubmed: 32337372
JMIR Med Inform. 2019 Apr 27;7(2):e12239
pubmed: 31066697

Auteurs

Hannah Horng (H)

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.

Jackson Steinkamp (J)

Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA.

Charles E Kahn (CE)

Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA. ckahn@upenn.edu.
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA. ckahn@upenn.edu.

Tessa S Cook (TS)

Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH