Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models.
Journal
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
ISSN: 2153-4063
Titre abrégé: AMIA Jt Summits Transl Sci Proc
Pays: United States
ID NLM: 101539486
Informations de publication
Date de publication:
Historique:
entrez:
30
8
2021
pubmed:
31
8
2021
medline:
11
9
2021
Statut:
epublish
Résumé
Restrictions in sharing Patient Health Identifiers (PHI) limit cross-organizational re-use of free-text medical data. We leverage Generative Adversarial Networks (GAN) to produce synthetic unstructured free-text medical data with low re-identification risk, and assess the suitability of these datasets to replicate machine learning models. We trained GAN models using unstructured free-text laboratory messages pertaining to salmonella, and identified the most accurate models for creating synthetic datasets that reflect the informational characteristics of the original dataset. Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics. There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets. Our results inform the use of GAN models to generate synthetic unstructured free-text data with limited re-identification risk, and use of this data to enable collaborative research and re-use of machine learning models.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
335-344Informations de copyright
©2021 AMIA - All rights reserved.
Références
J Biomed Inform. 2001 Oct;34(5):301-10
pubmed: 12123149
J Am Med Inform Assoc. 2017 May 1;24(3):596-606
pubmed: 28040687
Am J Gastroenterol. 2010 Jun;105(6):1224-6
pubmed: 20523307
BMC Med Res Methodol. 2010 Aug 02;10:70
pubmed: 20678228
Health Inf Sci Syst. 2014 Feb 07;2:3
pubmed: 25825667
Health Aff (Millwood). 2005 Sep-Oct;24(5):1214-20
pubmed: 16162565
PLoS One. 2011;6(12):e28071
pubmed: 22164229
J Am Med Inform Assoc. 2018 Jan 1;25(1):47-53
pubmed: 29177457
J Biomed Inform. 2017 May;69:160-176
pubmed: 28410983
JAMA. 1999 Oct 20;282(15):1466-71
pubmed: 10535438
JAMA. 2014 Jun 25;311(24):2479-80
pubmed: 24854141
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122
pubmed: 31284738
BMJ. 2015 Mar 20;350:h1139
pubmed: 25794882
J Biomed Inform. 2016 Apr;60:145-52
pubmed: 26826453