Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2020
Historique:
received: 28 11 2019
accepted: 24 04 2020
entrez: 13 5 2020
pubmed: 13 5 2020
medline: 1 8 2020
Statut: epublish

Résumé

Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results. Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss. The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1. Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

Sections du résumé

BACKGROUND
Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results.
METHODS
Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss.
RESULTS
The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1.
CONCLUSIONS
Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

Identifiants

pubmed: 32396543
doi: 10.1371/journal.pone.0232970
pii: PONE-D-19-33001
pmc: PMC7217432
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0232970

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Lancet. 2015 Jun 6;385(9984):2255-63
pubmed: 25771249
PLoS One. 2017 Sep 12;12(9):e0182362
pubmed: 28898245
Int J Epidemiol. 2015 Oct 8;45(2):408-416
pubmed: 26452388
Int J Epidemiol. 2017 Feb 1;46(1):103-105
pubmed: 27272186
Int J Epidemiol. 2014 Dec;43(6):1929-44
pubmed: 25261970
J Epidemiol. 2014;24(2):161-7
pubmed: 24317343
Lancet. 2016 Aug 20;388(10046):797-805
pubmed: 27474376
PLoS One. 2019 Jan 9;14(1):e0210139
pubmed: 30625194
Lancet Neurol. 2017 May;16(5):377-389
pubmed: 28359749

Auteurs

Lennard L van Wanrooij (LL)

Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Marieke P Hoevenaar-Blom (MP)

Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
Department of Neurology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands.

Nicola Coley (N)

Department of Epidemiology and Public Health, Toulouse University Hospital, Toulouse, France.
INSERM, University of Toulouse UMR1027, Toulouse, France.

Tiia Ngandu (T)

Chronic Disease Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland.

Yannick Meiller (Y)

Department of Information and Operations Management, ESCP Europe, Paris, France.

Juliette Guillemont (J)

INSERM, University of Toulouse, Toulouse, France.

Anna Rosenberg (A)

Department of Neurology, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.

Cathrien R L Beishuizen (CRL)

Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Eric P Moll van Charante (EP)

Department of General Practice, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Hilkka Soininen (H)

Department of Neurology, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.
Neurocenter, Neurology, Kuopio University Hospital, Kuopio, Finland.

Carol Brayne (C)

Department of Public Health and Primary Care, Cambridge Institute of Public Health, University of Cambridge, Cambridge, United Kingdom.

Sandrine Andrieu (S)

Department of Epidemiology and Public Health, Toulouse University Hospital, Toulouse, France.
INSERM, University of Toulouse UMR1027, Toulouse, France.

Miia Kivipelto (M)

Chronic Disease Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland.
Department of Neurology, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.
Aging Research Center, Karolinska Institutet, Stockholm University, Stockholm, Sweden.
Karolinska Institutet Center for Alzheimer Research, Stockholm, Sweden.

Edo Richard (E)

Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
Department of Neurology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH