Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.


Journal

AMIA ... Annual Symposium proceedings. AMIA Symposium
ISSN: 1942-597X
Titre abrégé: AMIA Annu Symp Proc
Pays: United States
ID NLM: 101209213

Informations de publication

Date de publication:
2023
Historique:
medline: 15 1 2024
pubmed: 15 1 2024
entrez: 15 1 2024
Statut: epublish

Résumé

Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.

Identifiants

pubmed: 38222422
pii: 927
pmc: PMC10785932

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

289-298

Informations de copyright

©2023 AMIA - All rights reserved.

Auteurs

Oliver J Bear Don't Walk Iv (OJ)

University of Washington, Seattle, WA.

Adrienne Pichon (A)

2 Columbia University, New York, New York.

Harry Reyes Nieva (HR)

2 Columbia University, New York, New York.
Harvard Medical School, Boston, Massachusetts.

Tony Sun (T)

2 Columbia University, New York, New York.

Jaan Altosaar (J)

One Fact Foundation, Claymont, DE.

Karthik Natarajan (K)

2 Columbia University, New York, New York.

Adler Perotte (A)

2 Columbia University, New York, New York.

Peter Tarczy-Hornoch (P)

University of Washington, Seattle, WA.

Dina Demner-Fushman (D)

US National Library of Medicine, Bethesda, Maryland.

Noémie Elhadad (N)

2 Columbia University, New York, New York.

Classifications MeSH