Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

MDClone big data analysis electronic medical records synthetic data validation study

Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
20 Feb 2020
Historique:
received: 03 10 2019
accepted: 27 12 2019
revised: 01 12 2019
entrez: 5 3 2020
pubmed: 5 3 2020
medline: 5 3 2020
Statut: epublish

Résumé

Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed. This paper aimed to validate the results obtained when analyzing synthetic structured data for medical research. A comprehensive validation process concerning meaningful clinical questions and various types of data was conducted to assess the accuracy and precision of statistical estimates derived from synthetic patient data. A cross-hospital project was conducted to validate results obtained from synthetic data produced for five contemporary studies on various topics. For each study, results derived from synthetic data were compared with those based on real data. In addition, repeatedly generated synthetic datasets were used to estimate the bias and stability of results obtained from synthetic data. This study demonstrated that results derived from synthetic data were predictive of results from real data. When the number of patients was large relative to the number of variables used, highly accurate and strongly consistent results were observed between synthetic and real data. For studies based on smaller populations that accounted for confounders and modifiers by multivariate models, predictions were of moderate accuracy, yet clear trends were correctly observed. The use of synthetic structured data provides a close estimate to real data results and is thus a powerful tool in shaping research hypotheses and accessing estimated analyses, without risking patient privacy. Synthetic data enable broad access to data (eg, for out-of-organization researchers), and rapid, safe, and repeatable analysis of data in hospitals or other health organizations where patient privacy is a primary value.

Sections du résumé

BACKGROUND BACKGROUND
Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed.
OBJECTIVE OBJECTIVE
This paper aimed to validate the results obtained when analyzing synthetic structured data for medical research. A comprehensive validation process concerning meaningful clinical questions and various types of data was conducted to assess the accuracy and precision of statistical estimates derived from synthetic patient data.
METHODS METHODS
A cross-hospital project was conducted to validate results obtained from synthetic data produced for five contemporary studies on various topics. For each study, results derived from synthetic data were compared with those based on real data. In addition, repeatedly generated synthetic datasets were used to estimate the bias and stability of results obtained from synthetic data.
RESULTS RESULTS
This study demonstrated that results derived from synthetic data were predictive of results from real data. When the number of patients was large relative to the number of variables used, highly accurate and strongly consistent results were observed between synthetic and real data. For studies based on smaller populations that accounted for confounders and modifiers by multivariate models, predictions were of moderate accuracy, yet clear trends were correctly observed.
CONCLUSIONS CONCLUSIONS
The use of synthetic structured data provides a close estimate to real data results and is thus a powerful tool in shaping research hypotheses and accessing estimated analyses, without risking patient privacy. Synthetic data enable broad access to data (eg, for out-of-organization researchers), and rapid, safe, and repeatable analysis of data in hospitals or other health organizations where patient privacy is a primary value.

Identifiants

pubmed: 32130148
pii: v8i2e16492
doi: 10.2196/16492
pmc: PMC7059086
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e16492

Informations de copyright

©Anat Reiner Benaim, Ronit Almog, Yuri Gorelik, Irit Hochberg, Laila Nassar, Tanya Mashiach, Mogher Khamaisi, Yael Lurie, Zaher S Azzam, Johad Khoury, Daniel Kurnik, Rafael Beyar. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 20.02.2020.

Références

Diabetes Care. 2018 Apr;41(4):e44-e46
pubmed: 29437697
Ann Pharmacother. 2005 Mar;39(3):502-7
pubmed: 15657117
J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238
pubmed: 29025144
PLoS One. 2011;6(12):e28071
pubmed: 22164229
Pharmacoepidemiol Drug Saf. 2008 Apr;17(4):365-71
pubmed: 18302300
Lancet. 2015 Mar 21;385(9973):1114-22
pubmed: 25467573
Semin Dial. 2010 Mar-Apr;23(2):163-8
pubmed: 20210915
N Engl J Med. 2017 May 25;376(21):2053-2064
pubmed: 28538121
ESC Heart Fail. 2019 Aug;6(4):809-816
pubmed: 31199082
Kidney Int Suppl. 2006 Apr;(100):S11-5
pubmed: 16612394
Arch Intern Med. 2008 Sep 22;168(17):1890-6
pubmed: 18809816
Invest Radiol. 2019 May;54(5):312-318
pubmed: 30480553
AMIA Annu Symp Proc. 2011;2011:1176-85
pubmed: 22195178
J Am Coll Cardiol. 2006 Jun 6;47(11):2180-6
pubmed: 16750682
Pharmacotherapy. 2013 Aug;33(8):827-37
pubmed: 23686895
Circulation. 2008 Oct 28;118(18):1894-909
pubmed: 18836135
Curr Cardiol Rev. 2015;11(1):53-62
pubmed: 24251454
Radiology. 2006 May;239(2):392-7
pubmed: 16543592
Eur Heart J. 2018 Jan 14;39(3):213-260
pubmed: 28886622
Ann Emerg Med. 2017 May;69(5):577-586.e4
pubmed: 28131489
Online J Public Health Inform. 2009;1(1):
pubmed: 23569572
BMC Med Inform Decis Mak. 2010 Oct 14;10:59
pubmed: 20946670
BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44
pubmed: 30871520
J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108
pubmed: 31592533
AMIA Annu Symp Proc. 2014 Nov 14;2014:1855-63
pubmed: 25954458
J Clin Endocrinol Metab. 2012 Jan;97(1):16-38
pubmed: 22223765

Auteurs

Anat Reiner Benaim (A)

Clinical Epidemiology Unit, Rambam Health Care Campus, Haifa, Israel.

Ronit Almog (R)

Clinical Epidemiology Unit, Rambam Health Care Campus, Haifa, Israel.
School of Public Health, University of Haifa, Haifa, Israel.

Yuri Gorelik (Y)

Department of Internal Medicine D, Rambam Health Care Campus, Haifa, Israel.

Irit Hochberg (I)

Institute of Endocrinology, Diabetes and Metabolism, Rambam Health Care Campus, Haifa, Israel.
The Ruth & Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.

Laila Nassar (L)

Clinical Pharmacology and Toxicology Section, Rambam Health Care Campus, Haifa, Israel.

Tanya Mashiach (T)

Clinical Epidemiology Unit, Rambam Health Care Campus, Haifa, Israel.

Mogher Khamaisi (M)

Department of Internal Medicine D, Rambam Health Care Campus, Haifa, Israel.
Institute of Endocrinology, Diabetes and Metabolism, Rambam Health Care Campus, Haifa, Israel.
Diabetes Stem Cell Laboratory, Rambam Health Care Campus, Haifa, Israel.

Yael Lurie (Y)

The Ruth & Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.
Clinical Pharmacology and Toxicology Section, Rambam Health Care Campus, Haifa, Israel.

Zaher S Azzam (ZS)

Department of Internal Medicine B, Rambam Health Care Campus, Haifa, Israel.
The Ruth & Bruce Rappaport Faculty of Medicine and Rappaport Research Institute, Technion-Israel Institute of Technology, Haifa, Israel.

Johad Khoury (J)

Department of Internal Medicine B, Rambam Health Care Campus, Haifa, Israel.

Daniel Kurnik (D)

The Ruth & Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.
Clinical Pharmacology Unit, Rambam Health Care Campus, Haifa, Israel.

Rafael Beyar (R)

The Ruth & Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.
Rambam Health Care Campus, Haifa, Israel.

Classifications MeSH