A proposed de-identification framework for a cohort of children presenting at a health facility in Uganda.
Journal
PLOS digital health
ISSN: 2767-3170
Titre abrégé: PLOS Digit Health
Pays: United States
ID NLM: 9918335064206676
Informations de publication
Date de publication:
Aug 2022
Aug 2022
Historique:
received:
24
03
2022
accepted:
08
07
2022
entrez:
22
2
2023
pubmed:
23
2
2023
medline:
23
2
2023
Statut:
epublish
Résumé
Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. Statistical data de-identification is an approach that can be used to preserve privacy and facilitate open data sharing. We have proposed a standardized framework for the de-identification of data generated from cohort studies in children in a low-and-middle income country. We applied a standardized de-identification framework to a data sets comprised of 241 health related variables collected from a cohort of 1750 children with acute infections from Jinja Regional Referral Hospital in Eastern Uganda. Variables were labeled as direct and quasi-identifiers based on conditions of replicability, distinguishability, and knowability with consensus from two independent evaluators. Direct identifiers were removed from the data sets, while a statistical risk-based de-identification approach using the k-anonymity model was applied to quasi-identifiers. Qualitative assessment of the level of privacy invasion associated with data set disclosure was used to determine an acceptable re-identification risk threshold, and corresponding k-anonymity requirement. A de-identification model using generalization, followed by suppression was applied using a logical stepwise approach to achieve k-anonymity. The utility of the de-identified data was demonstrated using a typical clinical regression example. The de-identified data sets was published on the Pediatric Sepsis Data CoLaboratory Dataverse which provides moderated data access. Researchers are faced with many challenges when providing access to clinical data. We provide a standardized de-identification framework that can be adapted and refined based on specific context and risks. This process will be combined with moderated access to foster coordination and collaboration in the clinical research community.
Identifiants
pubmed: 36812586
doi: 10.1371/journal.pdig.0000027
pii: PDIG-D-22-00081
pmc: PMC9931294
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e0000027Informations de copyright
Copyright: © 2022 Mawji et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Déclaration de conflit d'intérêts
JMA serves as a section editor for PLOS Digital Health. The peer-review process was guided by an independent editor, and the authors have no other competing interests to declare.
Références
BMC Health Serv Res. 2020 Jun 3;20(1):493
pubmed: 32493319
PLoS One. 2015 Feb 25;10(2):e0118053
pubmed: 25714752
Behav Res Methods. 2016 Sep;48(3):1062-9
pubmed: 26428912
PLoS One. 2007 Mar 21;2(3):e308
pubmed: 17375194
Can J Public Health. 2020 Oct;111(5):761-765
pubmed: 32162281
BMJ Evid Based Med. 2022 Aug;27(4):199-205
pubmed: 34373256
PLoS One. 2020 Sep 18;15(9):e0239283
pubmed: 32946521
Stat Med. 2006 Jan 15;25(1):127-41
pubmed: 16217841
BMC Med Inform Decis Mak. 2016 Apr 30;16:49
pubmed: 27130179
BMC Med Inform Decis Mak. 2011 Aug 23;11:53
pubmed: 21861894
J Med Internet Res. 2019 May 31;21(5):e13484
pubmed: 31152528
J Clin Pharmacol. 2021 Jun;61 Suppl 1:S70-S82
pubmed: 34185905
Front Public Health. 2016 Feb 17;4:7
pubmed: 26925395
Proc USENIX Secur Symp. 2014 Aug;2014:17-32
pubmed: 27077138