Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.
Journal
AMIA ... Annual Symposium proceedings. AMIA Symposium
ISSN: 1942-597X
Titre abrégé: AMIA Annu Symp Proc
Pays: United States
ID NLM: 101209213
Informations de publication
Date de publication:
2023
2023
Historique:
medline:
15
1
2024
pubmed:
15
1
2024
entrez:
15
1
2024
Statut:
epublish
Résumé
Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
289-298Informations de copyright
©2023 AMIA - All rights reserved.