An intentional approach to managing bias in general purpose embedding models.

Journal

The Lancet. Digital health

ISSN: 2589-7500

Titre abrégé: Lancet Digit Health

Pays: England

ID NLM: 101751302

Informations de publication

Date de publication:
Feb 2024

Historique:

received: 15 08 2023

revised: 24 10 2023

accepted: 02 11 2023

medline: 27 1 2024

pubmed: 27 1 2024

entrez: 26 1 2024

Statut: ppublish

Résumé

Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.

Identifiants

DOI: 10.1016/S2589-7500(23)00227-3 PMID: 38278614

pubmed: 38278614

pii: S2589-7500(23)00227-3

doi: 10.1016/S2589-7500(23)00227-3

pii:

doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

Pagination

e126-e130

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of interests W-HW, AS, APK, AD’A, JP, RP, SP, VN, SA, AK, HC-L, YM, GSC, DRW, SS, SP, KE, and YL are employees of Google and hold Alphabet stock. W-HW, AS, APK, AD’A, RP, SP, VN, SA, AK, HC-L, YM, GSC, DRW, SS, SP, KE, and YL have patents filed or in progress under Google, broadly related to machine learning and embedding models. CL performs work at Google as a medical consultant via Vituity and receives consulting fees for clinical perspective and guidance. LAGC receives support for educational events and meetings from the National Institutes of Health, Stanford University, University of California San Francisco, University of Toronto, College of Intensive Care Medicine of Australia and New Zealand, University of Bergen, Amsterdam University Medical Centers, Académie Nationale de Médecine (France), and the Doris Duke Foundation (for the Reconsidering Race in Clinical Algorithms workshop at the National Academy of Medicine, Washington, DC). LAGC is an Editor in Chief of PLOS Digital Health, and on the advisory board of the Lancet Digital Health.

An intentional approach to managing bias in general purpose embedding models.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Wei-Hung Weng (WH)

Andrew Sellergen (A)

Atilla P Kiraly (AP)

Alexander D'Amour (A)

Jungyeon Park (J)

Rory Pilgrim (R)

Stephen Pfohl (S)

Charles Lau (C)

Vivek Natarajan (V)

Shekoofeh Azizi (S)

Alan Karthikesalingam (A)

Heather Cole-Lewis (H)

Yossi Matias (Y)

Greg S Corrado (GS)

Dale R Webster (DR)

Shravya Shetty (S)

Shruthi Prabhakara (S)

Krish Eswaran (K)

Leo A G Celi (LAG)

Yun Liu (Y)

Classifications MeSH