Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging.


Journal

Communications medicine
ISSN: 2730-664X
Titre abrégé: Commun Med (Lond)
Pays: England
ID NLM: 9918250414506676

Informations de publication

Date de publication:
14 Mar 2024
Historique:
received: 06 04 2023
accepted: 16 02 2024
medline: 15 3 2024
pubmed: 15 3 2024
entrez: 15 3 2024
Statut: epublish

Résumé

Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training. Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness. Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients’ images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

Sections du résumé

BACKGROUND BACKGROUND
Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
METHODS METHODS
We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference.
RESULTS RESULTS
We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training.
CONCLUSIONS CONCLUSIONS
Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients’ images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

Autres résumés

Type: plain-language-summary (eng)
Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients’ images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

Identifiants

pubmed: 38486100
doi: 10.1038/s43856-024-00462-6
pii: 10.1038/s43856-024-00462-6
doi:

Types de publication

Journal Article

Langues

eng

Pagination

46

Subventions

Organisme : Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (Federal Ministry for Education, Science, Research and Technology)
ID : 01ZZ2316C
Organisme : Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research)
ID : 01KX2021
Organisme : Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (Federal Ministry for Education, Science, Research and Technology)
ID : 01ZZ2316C
Organisme : Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research)
ID : 01KD2215B
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101057091

Informations de copyright

© 2024. The Author(s).

Références

Usynin, D. et al. Adversarial interference and its mitigations in privacy-preserving collaborative machine learning. Nat. Mach. Intell. 3, 749–758 (2021).
doi: 10.1038/s42256-021-00390-3
Konečny`, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).
Konečny`, J. et al. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
McMahan, B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, 1273–1282 (PMLR, 2017).
Truhn, D. et al. Encrypted federated learning for secure decentralized collaboration in cancer image analysis. Med. Image Anal. (2024). https://doi.org/10.1016/j.media.2023.103059 .
Dwork, C. & Roth, A. et al. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2014).
doi: 10.1561/0400000042
Boenisch, F. et al. When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), 175–199 (IEEE, 2023).
Fowl, L., Geiping, J., Czaja, W., Goldblum, M. & Goldstein, T. Robbing the fed: Directly obtaining private data in federated learning with modified models. In International Conference on Learning Representations (2021).
Wang, K.-C. et al. Variational model inversion attacks. Adv. Neural Inf. Process. Syst. 34, 9706–9719 (2021).
Haim, N., Vardi, G., Yehudai, G., Shamir, O. & Irani, M. Reconstructing training data from trained neural networks. Adv. Neural Inf. Processing Syst. 35, 22911–22924 (2022).
Carlini, N. et al. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), 5253–5270 (2023).
Food, U. & Administration, D. Artificial intelligence and machine learning (ai/ml)-enabled medical devices. Webpage (2023). https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices .
Wasserman, L. & Zhou, S. A statistical framework for differential privacy. J. Am. Stat. Assoc. 105, 375–389 (2010).
doi: 10.1198/jasa.2009.tm08651
Dong, J., Roth, A. & Su, W. J. Gaussian differential privacy. J. Royal Stat. Soc. Ser. B: Stat. Methodol. 84, 3–37 (2022).
doi: 10.1111/rssb.12454
Kaissis, G., Hayes, J., Ziller, A. & Rueckert, D. Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy. Theory and Practice of Differential Privacy Workshop (2023).
Nasr, M. et al. Tight auditing of differentially private machine learning. In 32nd USENIX Security Symposium (USENIX Security 23), 1631–1648 (2023).
Kaissis, G. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 3, 473–484 (2021).
doi: 10.1038/s42256-021-00337-8
Hayes, J., Mahloujifar, S. & Balle, B. Bounding training data reconstruction in dp-sgd. arXiv preprint arXiv:2302.07225 (2023).
Balle, B., Cherubin, G. & Hayes, J. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), 1138–1156 (IEEE, 2022).
Cohen, A. & Nissim, K. Towards formalizing the gdpr’s notion of singling out. Proc. Nat. Acad. Sci. 117, 8344–8352 (2020).
doi: 10.1073/pnas.1914598117 pubmed: 32234789 pmcid: 7165454
Cohen, A. Attacks on deidentification’s defenses. In 31st USENIX Security Symposium (USENIX Security 22), 1469–1486 (2022).
Abadi, M. et al. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318 (2016).
Hatamizadeh, A. et al. Do gradient inversion attacks make federated learning unsafe? IEEE Trans. Med. Imaging (2023).
Dwork, C. A firm foundation for private data analysis. Commun. ACM 54, 86–95 (2011).
doi: 10.1145/1866739.1866758
De, S., Berrada, L., Hayes, J., Smith, S. L. & Balle, B. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650 (2022).
Kurakin, A. et al. Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328 (2022).
Tran, C., Fioretto, F., Van Hentenryck, P. & Yao, Z. Decision making with differential privacy under a fairness lens. In IJCAI, 560–566 (2021).
Cummings, R., Gupta, V., Kimpara, D. & Morgenstern, J. On the compatibility of privacy and fairness. In Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, 309–315 (2019).
Packhäuser, K. et al. Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest x-ray data. Sci. Rep. 12, 14851 (2022).
doi: 10.1038/s41598-022-19045-3 pubmed: 36050406 pmcid: 9434540
Narayanan, A. & Shmatikov, V. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), 111–125 (IEEE, 2008).
Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, 133–141 (Springer, 2019).
Ziegler, J., Pfitzner, B., Schulz, H., Saalbach, A. & Arnrich, B. Defending against reconstruction attacks through differentially private federated learning for classification of heterogeneous chest x-ray data. Sensors 22, 5195 (2022).
doi: 10.3390/s22145195 pubmed: 35890875 pmcid: 9320045
Farrand, T., Mireshghallah, F., Singh, S. & Trask, A. Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In Proceedings of the 2020 Workshop on Privacy-preserving Machine Learning in Practice, 15–19 (2020).
Bagdasaryan, E., Poursaeed, O. & Shmatikov, V. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32, https://proceedings.neurips.cc/paper_files/paper/2019/hash/fc0de4e0396fff257ea362983c2dda5a-Abstract.html (2019).
Khader, F. et al. Artificial intelligence for clinical interpretation of bedside chest radiographs. Radiology 307, e220510 (2022).
Tayebi Arasteh, S. et al. Collaborative training of medical artificial intelligence models with non-uniform labels. Sci. Rep. 13, 6046 (2023).
doi: 10.1038/s41598-023-33303-y pubmed: 37055456 pmcid: 10102221
Johnson, A. E. et al. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).
doi: 10.1038/s41597-019-0322-0 pubmed: 31831740 pmcid: 6908718
Klause, H., Ziller, A., Rueckert, D., Hammernik, K. & Kaissis, G. Differentially private training of residual networks with scale normalisation. Theory and Practice of Differential Privacy Workshop, ICML (2022).
Yang, J. et al. Reinventing 2d convolutions for 3d images. IEEE J. Biomed. Health Inform. 25, 3009–3018 (2021).
doi: 10.1109/JBHI.2021.3049452 pubmed: 33406047
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, 448–456 (pmlr, 2015).
Wu, Y. & He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (2018).
Johnson, A. et al. Mimic-cxr-jpg-chest radiographs with structured labels. PhysioNet (2019).
Fukushima, K. Cognitron: A self-organizing multilayered neural network. Biol. Cybern. 20, 121–136 (1975).
doi: 10.1007/BF00342633 pubmed: 1203338
Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 807–814 (2010).
Dozat, T. Incorporating nesterov momentum into adam. In International Conference on Learning Representations, Workshop Track (2016).
Misra, D. Mish: A self regularized non-monotonic activation function. In The 31st British Machine Vision Conference (2020).
Konietschke, F. & Pauly, M. Bootstrapping and permuting paired t-test type statistics. Stat. Comput. 24, 283–296 (2014).
doi: 10.1007/s11222-012-9370-4
Unal, I. Defining an optimal cut-point value in roc analysis: an alternative approach. Comput. Math. Methods Med. 2017 (2017).
Calders, T. & Verwer, S. Three naive bayes approaches for discrimination-free classification. Data Mining Knowl. Discov. 21, 277–292 (2010).
doi: 10.1007/s10618-010-0190-x
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 54, 1–35 (2021).
doi: 10.1145/3457607
Tayebi Arasteh, S. et al. Securing collaborative medical AI by using differential privacy: Domain transfer for classification of chest radiographs. Radiol. Artif. Intel. 6, e230212 (2024).
doi: 10.1148/ryai.230212
Wu, J. T. et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA Netw. Open 3, e2022779–e2022779 (2020).
doi: 10.1001/jamanetworkopen.2020.22779 pubmed: 33034642 pmcid: 7547369
Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
doi: 10.1038/s41591-021-01595-0 pubmed: 34893776 pmcid: 8674135
Yousefpour, A. et al. Opacus: User-friendly differential privacy library in pytorch (2021). https://arxiv.org/abs/2109.12298 .
Arasteh, S. T. DP CXR. https://doi.org/10.5281/zenodo.10361657 (2023).
Ziller, A. 2.5d attention. https://doi.org/10.5281/zenodo.10361128 (2023).

Auteurs

Soroosh Tayebi Arasteh (S)

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany. soroosh.arasteh@rwth-aachen.de.

Alexander Ziller (A)

Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany. alex.ziller@tum.de.
Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany. alex.ziller@tum.de.

Christiane Kuhl (C)

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.

Marcus Makowski (M)

Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany.

Sven Nebelung (S)

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.

Rickmer Braren (R)

Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany.

Daniel Rueckert (D)

Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany.

Daniel Truhn (D)

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany. dtruhn@ukaachen.de.

Georgios Kaissis (G)

Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany. g.kaissis@tum.de.
Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany. g.kaissis@tum.de.
Department of Computing, Imperial College London, London, United Kingdom. g.kaissis@tum.de.
Institute for Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany. g.kaissis@tum.de.

Classifications MeSH