The value of standards for health datasets in artificial intelligence-based applications.
Journal
Nature medicine
ISSN: 1546-170X
Titre abrégé: Nat Med
Pays: United States
ID NLM: 9502015
Informations de publication
Date de publication:
Nov 2023
Nov 2023
Historique:
received:
14
03
2023
accepted:
22
09
2023
medline:
27
11
2023
pubmed:
27
10
2023
entrez:
26
10
2023
Statut:
ppublish
Résumé
Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).
Identifiants
pubmed: 37884627
doi: 10.1038/s41591-023-02608-w
pii: 10.1038/s41591-023-02608-w
pmc: PMC10667100
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2929-2938Subventions
Organisme : Medical Research Council
ID : MC_PC_21015
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_PC_21055
Pays : United Kingdom
Informations de copyright
© 2023. The Author(s).
Références
Sidey-Gibbons, J. A. M. & Sidey-Gibbons, C. J. Machine learning in medicine: a practical introduction. BMC Med. Res. Methodol. 19, 64 (2019).
doi: 10.1186/s12874-019-0681-4
pubmed: 30890124
pmcid: 6425557
Ibrahim, H., Liu, X., Zariffa, N., Morris, A. D. & Denniston, A. K. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit. Health 3, e260–e265 (2021).
doi: 10.1016/S2589-7500(20)30317-4
pubmed: 33678589
Kuhlman, C., Jackson, L. & Chunara, R. No computation without representation: avoiding data and algorithm biases through diversity. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20) 3593 (ACM, 2020); https://doi.org/10.1145/3394486.3411074
Courbier, S., Dimond, R. & Bros-Facer, V. Share and protect our health data: an evidence based approach to rare disease patients’ perspectives on data sharing and data protection - quantitative survey and recommendations. Orphanet J. Rare Dis. 14, 175 (2019).
doi: 10.1186/s13023-019-1123-4
pubmed: 31300010
pmcid: 6625078
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu Rev. Biomed. Data Sci. 4, 123–44. (2021).
doi: 10.1146/annurev-biodatasci-092820-114757
pubmed: 34396058
pmcid: 8362902
Khan, S. M. et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit. Health 3, e51–e66 (2021).
doi: 10.1016/S2589-7500(20)30240-5
pubmed: 33735069
Wen, D. et al. Characteristics of publicly available skin cancer image datasets: a systematic review. Lancet Digit. Health 4, e64–e74 (2022).
doi: 10.1016/S2589-7500(21)00252-1
pubmed: 34772649
Kaushal, A., Altman, R. & Langlotz, C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA 324, 1212–1213 (2020).
doi: 10.1001/jama.2020.12067
pubmed: 32960230
pmcid: 7509620
Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, e406–e414 (2022).
doi: 10.1016/S2589-7500(22)00063-2
pubmed: 35568690
pmcid: 9650160
Glocker, B., Jones, C., Bernhardt, M. & Winzeck, S. Risk of bias in chest radiography foundation models. Radiol. Artif. Intell. https://doi.org/10.1148/ryai.230060 (2023).
Zou, J. & Schiebinger, L. Ensuring that biomedical AI benefits diverse populations. eBioMedicine https://doi.org/10.1016/j.ebiom.2021.103358 (2021).
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399. (2019).
doi: 10.1038/s42256-019-0088-2
Ethics and Governance of Artificial Intelligence for Health (WHO 2021); https://www.who.int/publications-detail-redirect/9789240029200
Block, R. G. et al. Recommendations for improving national clinical datasets for health equity research. J. Am. Med. Inform. Assoc. 27, 1802–1807 (2020).
doi: 10.1093/jamia/ocaa144
pubmed: 32885240
pmcid: 7671626
DeVoe, J. E. et al. The ADVANCE network: accelerating data value across a national community health center network. J. Am. Med. Inform. Assoc. 21, 591–595 (2014).
doi: 10.1136/amiajnl-2014-002744
pubmed: 24821740
pmcid: 4078289
Hasnain-Wynia, R. & Baker, D. W. Obtaining data on patient race, ethnicity, and primary language in health care organizations: current challenges and proposed solutions. Health Serv. Res. 411, 1501–1518 (2006).
doi: 10.1111/j.1475-6773.2006.00552.x
Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Notification [510(k)] Submissions. (FDA, 2022); https://www.fda.gov/regulatory-information/search-fda-guidance-documents/computer-assisted-detection-devices-applied-radiology-images-and-radiology-device-data-premarket
Ganapathi, S. et al. Tackling bias in AI health datasets through the STANDING Together initiative. Nat. Med. 28, 2232–2233 (2022).
doi: 10.1038/s41591-022-01987-w
pubmed: 36163296
Vollmer, S. et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. Br. Med. J. 368, l6927 (2020).
doi: 10.1136/bmj.l6927
Challen, R. et al. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 28, 231–237 (2019).
doi: 10.1136/bmjqs-2018-008370
pubmed: 30636200
pmcid: 6560460
Wiens, J. et al. Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25, 1337–40. (2019).
doi: 10.1038/s41591-019-0548-6
pubmed: 31427808
Saleh, S., Boag, W., Erdman, L. & Naumann, T. Clinical collabsheets: 53 questions to guide a clinical collaboration. In Proc. 5th Machine Learning for Healthcare Conference (eds Doshi-Velez, F. et al.) 783–812 (PMLR, 2022); https://proceedings.mlr.press/v126/saleh20a.html
Ferryman, K. Addressing health disparities in the Food and Drug Administration’s artificial intelligence and machine learning regulatory framework. J. Am. Med. Inform. Assoc. 27, 2016–2019 (2020).
doi: 10.1093/jamia/ocaa133
pubmed: 32951036
pmcid: 7727393
Suresh, H. & Guttag, J. A framework for understanding sources of harm throughout the machine learning life cycle. In Proc. Equity and Access in Algorithms, Mechanisms, and Optimization 1–9 (ACM, 2021); https://dl.acm.org/doi/10.1145/3465416.3483305
Lysaght, T., Lim, H. Y., Xafis, V. & Ngiam, K. Y. AI-assisted decision-making in healthcare: the application of an ethics framework for big data in health and research. Asian Bioeth. Rev. 11, 299–314 (2019).
doi: 10.1007/s41649-019-00096-0
pubmed: 33717318
pmcid: 7747260
Flanagin, A., Frey, T., Christiansen, S. L. & Bauchner, H. The reporting of race and ethnicity in medical and science journals: comments invited. JAMA 325, 1049–1052 (2021).
doi: 10.1001/jama.2021.2104
pubmed: 33616604
Cerdeña, J. P., Grubbs, V. & Non, A. L. Racialising genetic risk: assumptions, realities, and recommendations. Lancet 400, 2147–2154. (2022).
doi: 10.1016/S0140-6736(22)02040-2
pubmed: 36502852
Elias, J. Google contractor reportedly tricked homeless people into face scans. CNBC https://www.cnbc.com/2019/10/03/google-contractor-reportedly-tricked-homeless-people-into-face-scans.html (2019).
Equality Act 2010. Statute Law Database (UK Government, 2010); https://www.legislation.gov.uk/ukpga/2010/15/section/4
Declaration of the High-Level Meeting of the General Assembly on the Rule of Law at the National and International Levels (UN General Assembly, 2012); https://digitallibrary.un.org/record/734369
Article 21 - Non-Discrimination (European Union Agency for Fundamental Rights, 2007); https://fra.europa.eu/en/eu-charter/article/21-non-discrimination
Gebru, T. et al. Datasheets for datasets. Preprint at http://arxiv.org/abs/1803.09010 (2021).
Rostamzadeh, N. et al. Healthsheet: development of a transparency artifact for health datasets. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 1943–1961 (ACM, 2022); https://doi.org/10.1145/3531146.3533239
Smeaton, J. & Christie, L. AI and healthcare. UK Parliament POSTnote https://post.parliament.uk/research-briefings/post-pn-0637/ (2021).
Human bias and discrimination in AI systems. ICO https://webarchive.nationalarchives.gov.uk/ukgwa/20211004162239/https://ico.org.uk/about-the-ico/news-and-events/ai-blog-human-bias-and-discrimination-in-ai-systems/ (2019).
Artificial Intelligence and Machine Learning in Software as a Medical Device (FDA, 2021); https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
A Governance Framework for Algorithmic Accountability and Transparency (European Parliament, Directorate-General for Parliamentary Research Services, 2019); https://data.europa.eu/doi/10.2861/59990
WHO Issues First Global Report on Artificial Intelligence (AI) in Health and Six Guiding Principles for Its Design and Use (WHO, 2021); https://www.who.int/news/item/28-06-2021-who-issues-first-global-report-on-ai-in-health-and-six-guiding-principles-for-its-design-and-use
Regulatory Horizons Council: The Regulation of Artificial Intelligence as a Medical Device. (UK Government, 2022); https://www.gov.uk/government/publications/regulatory-horizons-council-the-regulation-of-artificial-intelligence-as-a-medical-device
Arora, A. & Arora, A. Generative adversarial networks and synthetic patient data: current challenges and future perspectives. Future Healthc. J. 9, 190–193 (2022).
doi: 10.7861/fhj.2022-0013
pubmed: 35928184
pmcid: 9345230
Burlina, P., Joshi, N., Paul, W., Pacheco, K. D. & Bressler, N. M. Addressing artificial intelligence bias in retinal diagnostics. Transl. Vis. Sci. Technol. 10, 13 (2021).
doi: 10.1167/tvst.10.2.13
pubmed: 34003898
pmcid: 7884292
Koivu, A., Sairanen, M., Airola, A. & Pahikkala, T. Synthetic minority oversampling of vital statistics data with generative adversarial networks. J. Am. Med. Inform. Assoc. 27, 1667–74. (2020).
doi: 10.1093/jamia/ocaa127
pubmed: 32885818
pmcid: 7750982
Murphy, K. et al. Artificial intelligence for good health: a scoping review of the ethics literature. BMC Med. Ethics 22, 14 (2021).
doi: 10.1186/s12910-021-00577-8
pubmed: 33588803
pmcid: 7885243
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
doi: 10.1038/sdata.2016.18
pubmed: 26978244
pmcid: 4792175
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–74. (2020).
doi: 10.1038/s41591-020-1034-x
pubmed: 32908283
pmcid: 7598943
Villarroel, N., Davidson, E., Pereyra-Zamora, P., Krasnik, A. & Bhopal, R. S. Heterogeneity/granularity in ethnicity classifications project: the need for refining assessment of health status. Eur. J. Public Health 29, 260–266 (2019).
doi: 10.1093/eurpub/cky191
pubmed: 30260371
Denton, E. et al. Bringing the people back in: contesting benchmark machine learning datasets. Preprint at http://arxiv.org/abs/2007.07399 (2020).
Holland, S., Hosny, A., Newman, S., Joseph, J. & Chmielinski, K. The dataset nutrition label: a framework to drive higher data quality standards. Preprint at http://arxiv.org/abs/1805.03677 (2018).
Floridi, L., Cowls, J., King, T. C. & Taddeo, M. How to design AI for social good: seven essential factors. Sci. Eng. Ethics 26, 1771–1796. (2020).
doi: 10.1007/s11948-020-00213-5
pubmed: 32246245
pmcid: 7286860
Char, D. S., Abràmoff, M. D. & Feudtner, C. Identifying ethical considerations for machine learning healthcare applications. Am. J. Bioeth. 20, 7–17 (2020).
doi: 10.1080/15265161.2020.1819469
pubmed: 33103967
pmcid: 7737650
Griffiths, K. E., Blain, J., Vajdic, C. M. & Jorm, L. Indigenous and tribal peoples data governance in health research: a systematic review. Int. J. Environ. Res. Public Health 18, 10318 (2021).
doi: 10.3390/ijerph181910318
pubmed: 34639617
pmcid: 8508308
Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J. P. A. & Shah, N. H. MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care. J. Am. Med. Inform. Assoc. 27, 2011–2015 (2020).
doi: 10.1093/jamia/ocaa088
pubmed: 32594179
pmcid: 7727333
Paulus, J. K. & Kent, D. M. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. NPJ Digit. Med. 3, 1–8 (2020).
doi: 10.1038/s41746-020-0304-9
McCradden, M. D., Joshi, S., Mazwi, M. & Anderson, J. A. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit. Health 2, e221–e223 (2020).
doi: 10.1016/S2589-7500(20)30065-0
pubmed: 33328054
Douglas, M. D., Dawes, D. E., Holden, K. B. & Mack, D. Missed policy opportunities to advance health equity by recording demographic data in electronic health records. Am. J. Public Health 105, S380–S388 (2015).
doi: 10.2105/AJPH.2014.302384
pubmed: 25905840
pmcid: 4455508
Mitchell, M. et al. Model cards for model reporting. In Proc. Conference on Fairness, Accountability, and Transparency (FAT* '19) 220–229 (ACM, 2019); https://doi.org/10.1145/3287560.3287596
Mörch, C. M., Gupta, A. & Mishara, B. L. Canada protocol: an ethical checklist for the use of artificial Intelligence in suicide prevention and mental health. Artif. Intell. Med. 108, 101934 (2020).
doi: 10.1016/j.artmed.2020.101934
pubmed: 32972663
Saleiro, P. et al. Aequitas: a bias and fairness audit toolkit. Preprint at http://arxiv.org/abs/1811.05577 (2019).
Xafis, V. et al. An ethics framework for big data in health and research. Asian Bioeth. Rev. 11, 227–254. (2019).
doi: 10.1007/s41649-019-00099-x
pubmed: 33717314
pmcid: 7747261
Abstracts from the 53rd European Society of Human Genetics (ESHG) conference: e-posters. Eur. J. Hum. Genet. 28, 798–1016 (2020).
Zhang, X. et al. Big data science: opportunities and challenges to address minority health and health disparities in the 21st century. Ethn. Dis. 27, 95–106 (2017).
doi: 10.18865/ed.27.2.95
pubmed: 28439179
pmcid: 5398183
Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G. & Chin, M. H. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 169, 866–872 (2018).
doi: 10.7326/M18-1990
pubmed: 30508424
pmcid: 6594166
Fletcher, R. R., Nakeshimana, A. & Olubeko, O. Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Front. Artif. Intell. https://doi.org/10.3389/frai.2020.561802 (2021).
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372, n71 (2021).
Braun, V. & Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77–101 (2006).
doi: 10.1191/1478088706qp063oa