Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.
data anonymization
genetic privacy
privacy
reidentification
Journal
JMIR bioinformatics and biotechnology
ISSN: 2563-3570
Titre abrégé: JMIR Bioinform Biotechnol
Pays: Canada
ID NLM: 101769661
Informations de publication
Date de publication:
27 May 2024
27 May 2024
Historique:
received:
06
11
2023
accepted:
29
03
2024
revised:
26
03
2024
medline:
27
6
2024
pubmed:
27
6
2024
entrez:
27
6
2024
Statut:
epublish
Résumé
Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Sections du résumé
BACKGROUND
BACKGROUND
Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation.
OBJECTIVE
OBJECTIVE
This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets.
METHODS
METHODS
We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success.
RESULTS
RESULTS
From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants.
CONCLUSIONS
CONCLUSIONS
On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Identifiants
pubmed: 38935957
pii: v5i1e54332
doi: 10.2196/54332
doi:
Types de publication
Journal Article
Review
Langues
eng
Pagination
e54332Informations de copyright
©Mara Thomas, Nuria Mackes, Asad Preuss-Dodhy, Thomas Wieland, Markus Bundschus. Originally published in JMIR Bioinformatics and Biotechnology (https://bioinform.jmir.org), 27.05.2024.