Sample size issues in time series regressions of counts on environmental exposures.
Environment
Poisson regression
Power
Sample size
Statistics
Time series regression
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
28 01 2020
28 01 2020
Historique:
received:
30
06
2018
accepted:
23
12
2019
entrez:
30
1
2020
pubmed:
30
1
2020
medline:
12
1
2021
Statut:
epublish
Résumé
Regression analyses of time series of disease counts on environmental determinants are a prominent component of environmental epidemiology. For planning such studies, it can be useful to predict the precision of estimated coefficients and power to detect associations of given magnitude. Existing generic approaches for this have been found somewhat complex to apply and do not easily extend to multiple series studies analysed in two stages. We have sought a simpler approximate approach which can easily extend to multiple series and give insight into factors determining precision. We derive approximate expressions for precision and hence power in single and multiple time series studies of counts from basic statistical theory, compare the precision predicted by these with that estimated by analysis in real data from 51 cities of varying size, and illustrate the use of these estimators in a realistic planning scenario. In single series studies with Poisson outcome distribution, precision and power depend only on the usable variation of exposure (i.e. that conditional on covariates) and the total number of disease events, regardless of how many days those are spread over. In multiple time series (eg multi-city) studies focusing on the meta-analytic mean coefficient, the usable exposure variation and the total number of events (in all series) are again the sole determinants if there is no between-series heterogeneity or within-series overdispersion. With heterogeneity, its extent and the number of series becomes important. For all but the crudest approximation the estimates of standard errors were on average within + 20% of those estimated in full analysis of actual data. Predicting precision in coefficients from a planned time series study is possible simply and given limited information. The total number of disease events and usable exposure variation are the dominant factors when overdispersion and between-series heterogeneity are low.
Sections du résumé
BACKGROUND
Regression analyses of time series of disease counts on environmental determinants are a prominent component of environmental epidemiology. For planning such studies, it can be useful to predict the precision of estimated coefficients and power to detect associations of given magnitude. Existing generic approaches for this have been found somewhat complex to apply and do not easily extend to multiple series studies analysed in two stages. We have sought a simpler approximate approach which can easily extend to multiple series and give insight into factors determining precision.
METHODS
We derive approximate expressions for precision and hence power in single and multiple time series studies of counts from basic statistical theory, compare the precision predicted by these with that estimated by analysis in real data from 51 cities of varying size, and illustrate the use of these estimators in a realistic planning scenario.
RESULTS
In single series studies with Poisson outcome distribution, precision and power depend only on the usable variation of exposure (i.e. that conditional on covariates) and the total number of disease events, regardless of how many days those are spread over. In multiple time series (eg multi-city) studies focusing on the meta-analytic mean coefficient, the usable exposure variation and the total number of events (in all series) are again the sole determinants if there is no between-series heterogeneity or within-series overdispersion. With heterogeneity, its extent and the number of series becomes important. For all but the crudest approximation the estimates of standard errors were on average within + 20% of those estimated in full analysis of actual data.
CONCLUSIONS
Predicting precision in coefficients from a planned time series study is possible simply and given limited information. The total number of disease events and usable exposure variation are the dominant factors when overdispersion and between-series heterogeneity are low.
Identifiants
pubmed: 31992211
doi: 10.1186/s12874-019-0894-6
pii: 10.1186/s12874-019-0894-6
pmc: PMC6988321
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
15Subventions
Organisme : Medical Research Council
ID : MR/M022625/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R013349/1
Pays : United Kingdom
Références
PLoS Med. 2005 Aug;2(8):e124
pubmed: 16060722
Environ Health. 2014 Jun 09;13(1):48
pubmed: 24912929
Int J Biometeorol. 2012 Jul;56(4):569-81
pubmed: 21975970
Thorax. 2014 Jul;69(7):660-5
pubmed: 24706041
J Epidemiol Community Health. 2007 Aug;61(8):719-22
pubmed: 17630372
Environ Health Perspect. 2012 Jan;120(1):19-28
pubmed: 21824855
Biostatistics. 2007 Apr;8(2):337-44
pubmed: 16809430
Environ Health Perspect. 2004 Oct;112(14):1365-71
pubmed: 15471726
Behav Res Methods. 2009 Nov;41(4):1149-60
pubmed: 19897823
Int J Epidemiol. 2013 Aug;42(4):1187-95
pubmed: 23760528
Environ Health. 2012 Sep 20;11:68
pubmed: 22995599
Lancet. 2015 Jul 25;386(9991):369-75
pubmed: 26003380
J Expo Sci Environ Epidemiol. 2015 Mar-Apr;25(2):208-14
pubmed: 25227730
Epidemiology. 2018 Sep;29(5):599-603
pubmed: 29912015
Stat Med. 2007 Mar 30;26(7):1632-48
pubmed: 16817148