Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals.
COVID-19
artificial intelligence
computer vision
federated learning
Journal
Journal of the American Medical Informatics Association : JAMIA
ISSN: 1527-974X
Titre abrégé: J Am Med Inform Assoc
Pays: England
ID NLM: 9430800
Informations de publication
Date de publication:
13 12 2022
13 12 2022
Historique:
received:
25
04
2022
revised:
31
08
2022
accepted:
07
10
2022
pubmed:
11
10
2022
medline:
16
12
2022
entrez:
10
10
2022
Statut:
ppublish
Résumé
Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.
Identifiants
pubmed: 36214629
pii: 6754819
doi: 10.1093/jamia/ocac188
pmc: PMC9619688
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, P.H.S.
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
54-63Subventions
Organisme : Patient-Centered Outcomes Research Institute
ID : K12HS026379
Pays : United States
Organisme : NHLBI NIH HHS
ID : 75N92020C00008
Pays : United States
Organisme : NCATS NIH HHS
ID : KL2TR002492
Pays : United States
Organisme : NHLBI NIH HHS
ID : 75N92020C00021
Pays : United States
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.