Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Artificial intelligence Dermatology Health equity Machine learning

Journal

EClinicalMedicine

ISSN: 2589-5370

Titre abrégé: EClinicalMedicine

Pays: England

ID NLM: 101733727

Informations de publication

Date de publication:
Apr 2024

Historique:

received: 21 09 2023

revised: 16 01 2024

accepted: 25 01 2024

medline: 30 4 2024

pubmed: 30 4 2024

entrez: 30 4 2024

Statut: epublish

Résumé

Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Google LLC.

Sections du résumé

Background UNASSIGNED

Methods UNASSIGNED

Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.

Findings UNASSIGNED

Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.

Interpretation UNASSIGNED

Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.

Funding UNASSIGNED

Google LLC.

Identifiants

DOI: 10.1016/j.eclinm.2024.102479 PMID: 38685924 PMC: PMC11056401

pubmed: 38685924

doi: 10.1016/j.eclinm.2024.102479

pii: S2589-5370(24)00058-0

pmc: PMC11056401

doi:

Types de publication

Journal Article

Langues

eng

Pagination

102479

Informations de copyright

Déclaration de conflit d'intérêts

This study was funded by Google LLC. MS, TS, HC, EW, SP, DM, RJ, GK, YL, SF, QX, CH, PS, FT, PB, LHP, CHM, YM, GSC, DW, SV, CS, YL, IH, PHCC are current or former employees of Google and own stock as part of the standard compensation package. MP and JL are paid consultants of Google.

Références

JAMA Dermatol. 2021 Apr 1;157(4):406-412

pubmed: 33595596

J Med Ethics. 2020 Mar;46(3):205-211

pubmed: 31748206

BMJ Health Care Inform. 2021 Apr;28(1):

pubmed: 33910923

JAMA Dermatol. 2018 Nov 1;154(11):1286-1291

pubmed: 30267073

Big Data. 2017 Jun;5(2):153-163

pubmed: 28632438

iScience. 2023 Sep 15;26(10):107924

pubmed: 37817930

Cancer Cytopathol. 2020 Jan;128(1):7-8

pubmed: 31905269

Lancet Digit Health. 2022 Jan;4(1):e1

pubmed: 34952673

JAMA. 2022 Jul 5;328(1):21-22

pubmed: 35788813

BMJ Health Care Inform. 2022 Apr;29(1):

pubmed: 35396245

Lancet. 2020 Oct 10;396(10257):1055-1056

pubmed: 33038952

Lepr Rev. 2000 Jun;71(2):123-7

pubmed: 10920608

Lancet. 2020 Oct 17;396(10258):1204-1222

pubmed: 33069326

Ann Intern Med. 2018 Dec 18;169(12):866-872

pubmed: 30508424

Health Hum Rights. 2020 Dec;22(2):71-74

pubmed: 33390696

BMJ Health Care Inform. 2022 Jan;29(1):

pubmed: 35012941

JAMA. 2021 Aug 17;326(7):618-620

pubmed: 34081100

J Public Health Manag Pract. 2016 Jan-Feb;22 Suppl 1:S33-42

pubmed: 26599027

Med Health Care Philos. 2021 Sep;24(3):341-349

pubmed: 33713239

EBioMedicine. 2023 Apr;90:104525

pubmed: 36924621

N Engl J Med. 2021 Dec 23;385(26):2496

pubmed: 34936755

BMJ Health Care Inform. 2022 Jun;29(1):

pubmed: 35688512

BMJ Health Care Inform. 2022 Apr;29(1):

pubmed: 35470133

Lancet. 2021 Oct 9;398(10308):1287-1289

pubmed: 34592136

Camb Q Healthc Ethics. 2022 Jan;31(1):83-94

pubmed: 35049447

BMJ Health Care Inform. 2022 Apr;29(1):

pubmed: 35396247

Chest. 2022 Jun;161(6):1621-1627

pubmed: 35143823

BMJ Health Care Inform. 2021 Sep;28(1):

pubmed: 34535447

Nat Med. 2020 Jun;26(6):900-908

pubmed: 32424212

Patterns (N Y). 2023 Jul 14;4(7):100790

pubmed: 37521051

Lancet. 2023 Sep 23;402(10407):1065-1082

pubmed: 37544309

NPJ Digit Med. 2022 Aug 18;5(1):119

pubmed: 35982146

Lancet Digit Health. 2022 May;4(5):e384-e397

pubmed: 35396183

BMJ Health Care Inform. 2022 Apr;29(1):

pubmed: 35410952

Ethics Inf Technol. 2022;24(3):39

pubmed: 36060496

BMC Public Health. 2019 Nov 25;19(1):1551

pubmed: 31760942

Science. 2019 Oct 25;366(6464):447-453

pubmed: 31649194

Clin Cosmet Investig Dermatol. 2021 May 24;14:547-550

pubmed: 34079319

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

Classifications MeSH