Pseudonymization for research data collection: is the juice worth the squeeze?


Journal

BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682

Informations de publication

Date de publication:
04 09 2019
Historique:
received: 01 03 2018
accepted: 29 08 2019
entrez: 6 9 2019
pubmed: 6 9 2019
medline: 23 2 2020
Statut: epublish

Résumé

The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses. When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties. We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.

Sections du résumé

BACKGROUND
The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses.
DISCUSSION
When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties.
CONCLUSION
We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.

Identifiants

pubmed: 31484555
doi: 10.1186/s12911-019-0905-x
pii: 10.1186/s12911-019-0905-x
pmc: PMC6727563
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

178

Références

Int J Med Inform. 2011 Mar;80(3):190-204
pubmed: 21075676
BMC Med Inform Decis Mak. 2013 Jul 24;13:75
pubmed: 23883409
Mol Psychiatry. 2012 Dec;17(12):1180-5
pubmed: 22392033
J Biomed Inform. 2014 Dec;52:28-35
pubmed: 24534443
Pharmacogenomics. 2003 Mar;4(2):209-15
pubmed: 12605555
J Am Med Inform Assoc. 1996 Mar-Apr;3(2):139-48
pubmed: 8653450
JAMA. 2015 Apr 14;313(14):1471-3
pubmed: 25871675
Public Health Genomics. 2012;15(5):254-62
pubmed: 22722689
Comput Methods Programs Biomed. 2008 Jul;91(1):82-90
pubmed: 18406002
BMC Med Inform Decis Mak. 2015 Nov 30;15:100
pubmed: 26621059
Stud Health Technol Inform. 2009;150:730-4
pubmed: 19745407
Mov Disord. 2007 Apr 15;22(5):611-8
pubmed: 17230444
Stud Health Technol Inform. 2010;160(Pt 2):1334-8
pubmed: 20841901
Eur J Hum Genet. 2000 Oct;8(10):739-42
pubmed: 11039572
J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7
pubmed: 20442151

Auteurs

Florian Kohlmayer (F)

Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.

Ronald Lautenschläger (R)

Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.

Fabian Prasser (F)

Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany. fabian.prasser@tum.de.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH