Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.

ancestry prediction data collaboration federated learning (FL) genomics machine learning phenotype prediction polygenic scores

Journal

Frontiers in big data
ISSN: 2624-909X
Titre abrégé: Front Big Data
Pays: Switzerland
ID NLM: 101770603

Informations de publication

Date de publication:
2024
Historique:
received: 26 07 2023
accepted: 31 01 2024
medline: 15 3 2024
pubmed: 15 3 2024
entrez: 15 3 2024
Statut: epublish

Résumé

Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.

Identifiants

pubmed: 38487517
doi: 10.3389/fdata.2024.1266031
pmc: PMC10937521
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1266031

Informations de copyright

Copyright © 2024 Kolobkov, Mishra Sharma, Medvedev, Lebedev, Kosaretskiy and Vakhitov.

Déclaration de conflit d'intérêts

All authors were employed by GENXT LTD.

Auteurs

Dmitry Kolobkov (D)

GENXT, Hinxton, United Kingdom.
Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Moscow, Russia.

Satyarth Mishra Sharma (S)

GENXT, Hinxton, United Kingdom.
Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia.

Aleksandr Medvedev (A)

GENXT, Hinxton, United Kingdom.
Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia.

Mikhail Lebedev (M)

GENXT, Hinxton, United Kingdom.

Egor Kosaretskiy (E)

GENXT, Hinxton, United Kingdom.

Ruslan Vakhitov (R)

GENXT, Hinxton, United Kingdom.

Classifications MeSH