Membership inference attacks against synthetic health data.

Contrastive representation learning Electronic health record Membership inference Synthetic data

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
01 2022
Historique:
received: 15 09 2021
revised: 17 11 2021
accepted: 08 12 2021
pubmed: 18 12 2021
medline: 3 2 2022
entrez: 17 12 2021
Statut: ppublish

Résumé

Synthetic data generation has emerged as a promising method to protect patient privacy while sharing individual-level health data. Intuitively, sharing synthetic data should reduce disclosure risks because no explicit linkage is retained between the synthetic records and the real data upon which it is based. However, the risks associated with synthetic data are still evolving, and what seems protected today may not be tomorrow. In this paper, we show that membership inference attacks, whereby an adversary infers if the data from certain target individuals (known to the adversary a priori) were relied upon by the synthetic data generation process, can be substantially enhanced through state-of-the-art machine learning frameworks, which calls into question the protective nature of existing synthetic data generators. Specifically, we formulate the membership inference problem from the perspective of the data holder, who aims to perform a disclosure risk assessment prior to sharing any health data. To support such an assessment, we introduce a framework for effective membership inference against synthetic health data without specific assumptions about the generative model or a well-defined data structure, leveraging the principles of contrastive representation learning. To illustrate the potential for such an attack, we conducted experiments against synthesis approaches using two datasets derived from several health data resources (Vanderbilt University Medical Center, the All of Us Research Program) to determine the upper bound of risk brought by an adversary who invokes an optimal strategy. The results indicate that partially synthetic data are vulnerable to membership inference at a very high rate. By contrast, fully synthetic data are only marginally susceptible and, in most cases, could be deemed sufficiently protected from membership inference.

Identifiants

pubmed: 34920126
pii: S1532-0464(21)00306-3
doi: 10.1016/j.jbi.2021.103977
pmc: PMC8766950
mid: NIHMS1765731
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

103977

Subventions

Organisme : NIH HHS
ID : U2C OD023196
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002243
Pays : United States

Informations de copyright

Copyright © 2021 Elsevier Inc. All rights reserved.

Références

N Engl J Med. 2019 Nov 7;381(19):1883-1884
pubmed: 31693824
J Am Med Inform Assoc. 2021 Mar 1;28(3):596-604
pubmed: 33277896
J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419
pubmed: 32989459
PLoS Genet. 2008 Aug 29;4(8):e1000167
pubmed: 18769715
Clin Pharmacol Ther. 2012 Aug;92(2):235-42
pubmed: 22739144
J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276
pubmed: 34333623
J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108
pubmed: 31592533
J Law Med Ethics. 1997 Summer-Fall;25(2-3):98-110, 82
pubmed: 11066504
J Am Med Inform Assoc. 2021 Jan 15;28(1):3-13
pubmed: 33186440
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327
pubmed: 30040631
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122
pubmed: 31284738
J Am Med Inform Assoc. 2021 Mar 1;28(3):427-443
pubmed: 32805036

Auteurs

Ziqi Zhang (Z)

Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240, United States. Electronic address: ziqi.zhang@vanderbilt.edu.

Chao Yan (C)

Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240, United States.

Bradley A Malin (BA)

Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240, United States; Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37240, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH