Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.

Cognitive Dysfunction / prevention & control Computer Simulation Data Interpretation, Statistical Dementia / prevention & control Humans Linear Models Meta-Analysis as Topic Randomized Controlled Trials as Topic / statistics & numerical data Reproducibility of Results Sample Size

Journal

PloS one

ISSN: 1932-6203

Titre abrégé: PLoS One

Pays: United States

ID NLM: 101285081

Informations de publication

Date de publication:
2020

Historique:

received: 28 11 2019

accepted: 24 04 2020

entrez: 13 5 2020

pubmed: 13 5 2020

medline: 1 8 2020

Statut: epublish

Résumé

Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results. Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss. The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1. Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

Sections du résumé

BACKGROUND

METHODS

Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss.

RESULTS

The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1.

CONCLUSIONS

Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

Identifiants

DOI: 10.1371/journal.pone.0232970 PMID: 32396543 PMC: PMC7217432

pubmed: 32396543

doi: 10.1371/journal.pone.0232970

pii: PONE-D-19-33001

pmc: PMC7217432

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e0232970

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Lancet. 2015 Jun 6;385(9984):2255-63

pubmed: 25771249

PLoS One. 2017 Sep 12;12(9):e0182362

pubmed: 28898245

Int J Epidemiol. 2015 Oct 8;45(2):408-416

pubmed: 26452388

Int J Epidemiol. 2017 Feb 1;46(1):103-105

pubmed: 27272186

Int J Epidemiol. 2014 Dec;43(6):1929-44

pubmed: 25261970

J Epidemiol. 2014;24(2):161-7

pubmed: 24317343

Lancet. 2016 Aug 20;388(10046):797-805

pubmed: 27474376

PLoS One. 2019 Jan 9;14(1):e0210139

pubmed: 30625194

Lancet Neurol. 2017 May;16(5):377-389

pubmed: 28359749

Pooling individual participant data from randomized controlled trials: Exploring potential loss of information.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Déclaration de conflit d'intérêts

Références

Auteurs

Lennard L van Wanrooij (LL)

Marieke P Hoevenaar-Blom (MP)

Nicola Coley (N)

Tiia Ngandu (T)

Yannick Meiller (Y)

Juliette Guillemont (J)

Anna Rosenberg (A)

Cathrien R L Beishuizen (CRL)

Eric P Moll van Charante (EP)

Hilkka Soininen (H)

Carol Brayne (C)

Sandrine Andrieu (S)

Miia Kivipelto (M)

Edo Richard (E)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH