Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models.


Journal

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
ISSN: 2153-4063
Titre abrégé: AMIA Jt Summits Transl Sci Proc
Pays: United States
ID NLM: 101539486

Informations de publication

Date de publication:
Historique:
entrez: 30 8 2021
pubmed: 31 8 2021
medline: 11 9 2021
Statut: epublish

Résumé

Restrictions in sharing Patient Health Identifiers (PHI) limit cross-organizational re-use of free-text medical data. We leverage Generative Adversarial Networks (GAN) to produce synthetic unstructured free-text medical data with low re-identification risk, and assess the suitability of these datasets to replicate machine learning models. We trained GAN models using unstructured free-text laboratory messages pertaining to salmonella, and identified the most accurate models for creating synthetic datasets that reflect the informational characteristics of the original dataset. Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics. There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets. Our results inform the use of GAN models to generate synthetic unstructured free-text data with limited re-identification risk, and use of this data to enable collaborative research and re-use of machine learning models.

Identifiants

pubmed: 34457148
pii: 3478299
pmc: PMC8378601

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

335-344

Informations de copyright

©2021 AMIA - All rights reserved.

Références

J Biomed Inform. 2001 Oct;34(5):301-10
pubmed: 12123149
J Am Med Inform Assoc. 2017 May 1;24(3):596-606
pubmed: 28040687
Am J Gastroenterol. 2010 Jun;105(6):1224-6
pubmed: 20523307
BMC Med Res Methodol. 2010 Aug 02;10:70
pubmed: 20678228
Health Inf Sci Syst. 2014 Feb 07;2:3
pubmed: 25825667
Health Aff (Millwood). 2005 Sep-Oct;24(5):1214-20
pubmed: 16162565
PLoS One. 2011;6(12):e28071
pubmed: 22164229
J Am Med Inform Assoc. 2018 Jan 1;25(1):47-53
pubmed: 29177457
J Biomed Inform. 2017 May;69:160-176
pubmed: 28410983
JAMA. 1999 Oct 20;282(15):1466-71
pubmed: 10535438
JAMA. 2014 Jun 25;311(24):2479-80
pubmed: 24854141
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122
pubmed: 31284738
BMJ. 2015 Mar 20;350:h1139
pubmed: 25794882
J Biomed Inform. 2016 Apr;60:145-52
pubmed: 26826453

Auteurs

Suranga N Kasthurirathne (SN)

Regenstrief Institute, Indianapolis, IN, USA.
Indiana University School of Medicine, Indianapolis, IN, USA.

Gregory Dexter (G)

Purdue University, Indianapolis, IN, USA.

Shaun J Grannis (SJ)

Regenstrief Institute, Indianapolis, IN, USA.
Indiana University School of Medicine, Indianapolis, IN, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH