Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Artificial intelligence Dermatology Health equity Machine learning

Journal

EClinicalMedicine
ISSN: 2589-5370
Titre abrégé: EClinicalMedicine
Pays: England
ID NLM: 101733727

Informations de publication

Date de publication:
Apr 2024
Historique:
received: 21 09 2023
revised: 16 01 2024
accepted: 25 01 2024
medline: 30 4 2024
pubmed: 30 4 2024
entrez: 30 4 2024
Statut: epublish

Résumé

Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Google LLC.

Sections du résumé

Background UNASSIGNED
Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study.
Methods UNASSIGNED
Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.
Findings UNASSIGNED
Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.
Interpretation UNASSIGNED
Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.
Funding UNASSIGNED
Google LLC.

Identifiants

pubmed: 38685924
doi: 10.1016/j.eclinm.2024.102479
pii: S2589-5370(24)00058-0
pmc: PMC11056401
doi:

Types de publication

Journal Article

Langues

eng

Pagination

102479

Informations de copyright

© 2024 The Author(s).

Déclaration de conflit d'intérêts

This study was funded by Google LLC. MS, TS, HC, EW, SP, DM, RJ, GK, YL, SF, QX, CH, PS, FT, PB, LHP, CHM, YM, GSC, DW, SV, CS, YL, IH, PHCC are current or former employees of Google and own stock as part of the standard compensation package. MP and JL are paid consultants of Google.

Références

JAMA Dermatol. 2021 Apr 1;157(4):406-412
pubmed: 33595596
J Med Ethics. 2020 Mar;46(3):205-211
pubmed: 31748206
BMJ Health Care Inform. 2021 Apr;28(1):
pubmed: 33910923
JAMA Dermatol. 2018 Nov 1;154(11):1286-1291
pubmed: 30267073
Big Data. 2017 Jun;5(2):153-163
pubmed: 28632438
iScience. 2023 Sep 15;26(10):107924
pubmed: 37817930
Cancer Cytopathol. 2020 Jan;128(1):7-8
pubmed: 31905269
Lancet Digit Health. 2022 Jan;4(1):e1
pubmed: 34952673
JAMA. 2022 Jul 5;328(1):21-22
pubmed: 35788813
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35396245
Lancet. 2020 Oct 10;396(10257):1055-1056
pubmed: 33038952
Lepr Rev. 2000 Jun;71(2):123-7
pubmed: 10920608
Lancet. 2020 Oct 17;396(10258):1204-1222
pubmed: 33069326
Ann Intern Med. 2018 Dec 18;169(12):866-872
pubmed: 30508424
Health Hum Rights. 2020 Dec;22(2):71-74
pubmed: 33390696
BMJ Health Care Inform. 2022 Jan;29(1):
pubmed: 35012941
JAMA. 2021 Aug 17;326(7):618-620
pubmed: 34081100
J Public Health Manag Pract. 2016 Jan-Feb;22 Suppl 1:S33-42
pubmed: 26599027
Med Health Care Philos. 2021 Sep;24(3):341-349
pubmed: 33713239
EBioMedicine. 2023 Apr;90:104525
pubmed: 36924621
N Engl J Med. 2021 Dec 23;385(26):2496
pubmed: 34936755
BMJ Health Care Inform. 2022 Jun;29(1):
pubmed: 35688512
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35470133
Lancet. 2021 Oct 9;398(10308):1287-1289
pubmed: 34592136
Camb Q Healthc Ethics. 2022 Jan;31(1):83-94
pubmed: 35049447
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35396247
Chest. 2022 Jun;161(6):1621-1627
pubmed: 35143823
BMJ Health Care Inform. 2021 Sep;28(1):
pubmed: 34535447
Nat Med. 2020 Jun;26(6):900-908
pubmed: 32424212
Patterns (N Y). 2023 Jul 14;4(7):100790
pubmed: 37521051
Lancet. 2023 Sep 23;402(10407):1065-1082
pubmed: 37544309
NPJ Digit Med. 2022 Aug 18;5(1):119
pubmed: 35982146
Lancet Digit Health. 2022 May;4(5):e384-e397
pubmed: 35396183
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35410952
Ethics Inf Technol. 2022;24(3):39
pubmed: 36060496
BMC Public Health. 2019 Nov 25;19(1):1551
pubmed: 31760942
Science. 2019 Oct 25;366(6464):447-453
pubmed: 31649194
Clin Cosmet Investig Dermatol. 2021 May 24;14:547-550
pubmed: 34079319

Auteurs

Mike Schaekermann (M)

Google Health, Mountain View, CA, USA.

Terry Spitz (T)

Google Health, Mountain View, CA, USA.

Malcolm Pyles (M)

Advanced Clinical, Deerfield, IL, USA.
Department of Dermatology, Cleveland Clinic, Cleveland, OH, USA.

Heather Cole-Lewis (H)

Google Health, Mountain View, CA, USA.

Ellery Wulczyn (E)

Google Health, Mountain View, CA, USA.

Stephen R Pfohl (SR)

Google Health, Mountain View, CA, USA.

Donald Martin (D)

Google Health, Mountain View, CA, USA.

Ronnachai Jaroensri (R)

Google Health, Mountain View, CA, USA.

Geoff Keeling (G)

Google Health, Mountain View, CA, USA.

Yuan Liu (Y)

Google Health, Mountain View, CA, USA.

Stephanie Farquhar (S)

Google Health, Mountain View, CA, USA.

Qinghan Xue (Q)

Google Health, Mountain View, CA, USA.

Jenna Lester (J)

Advanced Clinical, Deerfield, IL, USA.
Department of Dermatology, University of California, San Francisco, CA, USA.

Cían Hughes (C)

Google Health, Mountain View, CA, USA.

Patricia Strachan (P)

Google Health, Mountain View, CA, USA.

Fraser Tan (F)

Google Health, Mountain View, CA, USA.

Peggy Bui (P)

Google Health, Mountain View, CA, USA.

Craig H Mermel (CH)

Google Health, Mountain View, CA, USA.

Lily H Peng (LH)

Google Health, Mountain View, CA, USA.

Yossi Matias (Y)

Google Health, Mountain View, CA, USA.

Greg S Corrado (GS)

Google Health, Mountain View, CA, USA.

Dale R Webster (DR)

Google Health, Mountain View, CA, USA.

Sunny Virmani (S)

Google Health, Mountain View, CA, USA.

Christopher Semturs (C)

Google Health, Mountain View, CA, USA.

Yun Liu (Y)

Google Health, Mountain View, CA, USA.

Ivor Horn (I)

Google Health, Mountain View, CA, USA.

Po-Hsuan Cameron Chen (PH)

Google Health, Mountain View, CA, USA.

Classifications MeSH