Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.
Artificial intelligence
Dermatology
Health equity
Machine learning
Journal
EClinicalMedicine
ISSN: 2589-5370
Titre abrégé: EClinicalMedicine
Pays: England
ID NLM: 101733727
Informations de publication
Date de publication:
Apr 2024
Apr 2024
Historique:
received:
21
09
2023
revised:
16
01
2024
accepted:
25
01
2024
medline:
30
4
2024
pubmed:
30
4
2024
entrez:
30
4
2024
Statut:
epublish
Résumé
Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Google LLC.
Sections du résumé
Background
UNASSIGNED
Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study.
Methods
UNASSIGNED
Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.
Findings
UNASSIGNED
Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.
Interpretation
UNASSIGNED
Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.
Funding
UNASSIGNED
Google LLC.
Identifiants
pubmed: 38685924
doi: 10.1016/j.eclinm.2024.102479
pii: S2589-5370(24)00058-0
pmc: PMC11056401
doi:
Types de publication
Journal Article
Langues
eng
Pagination
102479Informations de copyright
© 2024 The Author(s).
Déclaration de conflit d'intérêts
This study was funded by Google LLC. MS, TS, HC, EW, SP, DM, RJ, GK, YL, SF, QX, CH, PS, FT, PB, LHP, CHM, YM, GSC, DW, SV, CS, YL, IH, PHCC are current or former employees of Google and own stock as part of the standard compensation package. MP and JL are paid consultants of Google.
Références
JAMA Dermatol. 2021 Apr 1;157(4):406-412
pubmed: 33595596
J Med Ethics. 2020 Mar;46(3):205-211
pubmed: 31748206
BMJ Health Care Inform. 2021 Apr;28(1):
pubmed: 33910923
JAMA Dermatol. 2018 Nov 1;154(11):1286-1291
pubmed: 30267073
Big Data. 2017 Jun;5(2):153-163
pubmed: 28632438
iScience. 2023 Sep 15;26(10):107924
pubmed: 37817930
Cancer Cytopathol. 2020 Jan;128(1):7-8
pubmed: 31905269
Lancet Digit Health. 2022 Jan;4(1):e1
pubmed: 34952673
JAMA. 2022 Jul 5;328(1):21-22
pubmed: 35788813
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35396245
Lancet. 2020 Oct 10;396(10257):1055-1056
pubmed: 33038952
Lepr Rev. 2000 Jun;71(2):123-7
pubmed: 10920608
Lancet. 2020 Oct 17;396(10258):1204-1222
pubmed: 33069326
Ann Intern Med. 2018 Dec 18;169(12):866-872
pubmed: 30508424
Health Hum Rights. 2020 Dec;22(2):71-74
pubmed: 33390696
BMJ Health Care Inform. 2022 Jan;29(1):
pubmed: 35012941
JAMA. 2021 Aug 17;326(7):618-620
pubmed: 34081100
J Public Health Manag Pract. 2016 Jan-Feb;22 Suppl 1:S33-42
pubmed: 26599027
Med Health Care Philos. 2021 Sep;24(3):341-349
pubmed: 33713239
EBioMedicine. 2023 Apr;90:104525
pubmed: 36924621
N Engl J Med. 2021 Dec 23;385(26):2496
pubmed: 34936755
BMJ Health Care Inform. 2022 Jun;29(1):
pubmed: 35688512
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35470133
Lancet. 2021 Oct 9;398(10308):1287-1289
pubmed: 34592136
Camb Q Healthc Ethics. 2022 Jan;31(1):83-94
pubmed: 35049447
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35396247
Chest. 2022 Jun;161(6):1621-1627
pubmed: 35143823
BMJ Health Care Inform. 2021 Sep;28(1):
pubmed: 34535447
Nat Med. 2020 Jun;26(6):900-908
pubmed: 32424212
Patterns (N Y). 2023 Jul 14;4(7):100790
pubmed: 37521051
Lancet. 2023 Sep 23;402(10407):1065-1082
pubmed: 37544309
NPJ Digit Med. 2022 Aug 18;5(1):119
pubmed: 35982146
Lancet Digit Health. 2022 May;4(5):e384-e397
pubmed: 35396183
BMJ Health Care Inform. 2022 Apr;29(1):
pubmed: 35410952
Ethics Inf Technol. 2022;24(3):39
pubmed: 36060496
BMC Public Health. 2019 Nov 25;19(1):1551
pubmed: 31760942
Science. 2019 Oct 25;366(6464):447-453
pubmed: 31649194
Clin Cosmet Investig Dermatol. 2021 May 24;14:547-550
pubmed: 34079319