Constructing synthetic populations in the age of big data.

Disclosure risk Synthetic population

Journal

Population health metrics
ISSN: 1478-7954
Titre abrégé: Popul Health Metr
Pays: England
ID NLM: 101178411

Informations de publication

Date de publication:
31 Oct 2023
Historique:
received: 15 03 2022
accepted: 19 10 2023
medline: 2 11 2023
pubmed: 1 11 2023
entrez: 1 11 2023
Statut: epublish

Résumé

To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population. Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features. The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features. By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.

Sections du résumé

BACKGROUND BACKGROUND
To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population.
METHODS METHODS
Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features.
RESULTS RESULTS
The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features.
CONCLUSIONS CONCLUSIONS
By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.

Identifiants

pubmed: 37907904
doi: 10.1186/s12963-023-00319-5
pii: 10.1186/s12963-023-00319-5
pmc: PMC10617102
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

19

Informations de copyright

© 2023. The Author(s).

Références

PLoS One. 2020 Sep 11;15(9):e0238565
pubmed: 32915826
Med Decis Making. 2016 Jul;36(5):652-65
pubmed: 26957567
PLoS One. 2020 Apr 29;15(4):e0231725
pubmed: 32348352
J Public Health (Oxf). 2018 Sep 1;40(3):e351-e358
pubmed: 29325124
PLoS One. 2018 Nov 15;13(11):e0205225
pubmed: 30439941
Environ Plan A. 1998 May;30(5):785-816
pubmed: 12293871
BMC Public Health. 2021 Jun 2;21(1):1039
pubmed: 34078308
Demography. 2012 Nov;49(4):1259-83
pubmed: 23055232

Auteurs

Mioara A Nicolaie (MA)

Centre for Nutrition, Prevention and Health Services, RIVM (National Institute for Public Health and the Environment), P.O. Box 1, Mailbox 86, 3720 BA, Bilthoven, The Netherlands. alina.nicolaie@rivm.nl.

Koen Füssenich (K)

Capaciteit Orgaan (Advisory Committee on Medical Manpower Planning), Mercatorlaan 1200, 3525 BL, Utrecht, The Netherlands.

Caroline Ameling (C)

Centre for Nutrition, Prevention and Health Services, RIVM (National Institute for Public Health and the Environment), P.O. Box 1, Mailbox 86, 3720 BA, Bilthoven, The Netherlands.

Hendriek C Boshuizen (HC)

Centre for Nutrition, Prevention and Health Services, RIVM (National Institute for Public Health and the Environment), P.O. Box 1, Mailbox 86, 3720 BA, Bilthoven, The Netherlands.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH