A comparison of statistical methods for modeling count data with an application to hospital length of stay.

Count data Negative binomial regression Poisson regression Simulation study Zero-inflated Poisson regression Zero-inflated negative binomial regression

Journal

BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545

Informations de publication

Date de publication:
04 08 2022
Historique:
received: 16 02 2022
accepted: 11 07 2022
entrez: 4 8 2022
pubmed: 5 8 2022
medline: 9 8 2022
Statut: epublish

Résumé

Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data. Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database. Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors. Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.

Sections du résumé

BACKGROUND
Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data.
METHODS
Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database.
RESULTS
Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors.
CONCLUSIONS
Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.

Identifiants

pubmed: 35927612
doi: 10.1186/s12874-022-01685-8
pii: 10.1186/s12874-022-01685-8
pmc: PMC9351158
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

211

Informations de copyright

© 2022. The Author(s).

Références

BMC Med. 2020 Sep 3;18(1):270
pubmed: 32878619
Hosp Health Serv Adm. 1997 Winter;42(4):489-507
pubmed: 10174462
Health Serv Outcomes Res Methodol. 2006 Dec;6(3-4):127-138
pubmed: 18059977
AMIA Jt Summits Transl Sci Proc. 2019 May 06;2019:425-434
pubmed: 31258996
Sci Data. 2016 May 24;3:160035
pubmed: 27219127
JAMA Surg. 2015 May;150(5):450-6
pubmed: 25761045
Diabetes Care. 1994 Nov;17(11):1320-9
pubmed: 7821174
J Am Coll Surg. 2000 Aug;191(2):123-30
pubmed: 10945354
J Rehabil Med Clin Commun. 2019 May 23;2:1000017
pubmed: 33884118
Biometrics. 1986 Mar;42(1):121-30
pubmed: 3719049
Can Respir J. 2006 Sep;13(6):317-24
pubmed: 16983447
Qual Saf Health Care. 2002 Sep;11(3):219-23
pubmed: 12486984
Am J Surg. 2019 Apr;217(4):618-633
pubmed: 30466953
BMC Health Serv Res. 2015 Jan 22;15:12
pubmed: 25609196
Shanghai Arch Psychiatry. 2014 Aug;26(4):236-42
pubmed: 25317011
Circulation. 2000 Jun 13;101(23):E215-20
pubmed: 10851218
J Health Econ. 1998 Jun;17(3):283-95
pubmed: 10180919
Psychol Bull. 1995 Nov;118(3):392-404
pubmed: 7501743
Int J Environ Res Public Health. 2020 Dec 18;17(24):
pubmed: 33352913
J Prim Care Community Health. 2021 Jan-Dec;12:21501327211000231
pubmed: 33729040
Ann Ig. 2019 Sep-Oct;31(5):507-516
pubmed: 31304530
Stat Med. 2015 Oct 30;34(24):3235-45
pubmed: 26078035
BMC Health Serv Res. 2021 May 29;21(1):523
pubmed: 34049553
PLoS One. 2018 Apr 13;13(4):e0195901
pubmed: 29652932
Diabetes Care. 2000 Dec;23(12):1774-9
pubmed: 11128351
BMC Med Inform Decis Mak. 2014 Apr 04;14:26
pubmed: 24708853
Epidemiol Infect. 2012 Jun;140(6):1087-94
pubmed: 21875452
BMC Health Serv Res. 2018 Feb 14;18(1):116
pubmed: 29444713
BMC Health Serv Res. 2016 Jul 29;16:318
pubmed: 27473872
Stud Health Technol Inform. 2017;238:157-160
pubmed: 28679912
Healthcare (Basel). 2021 Feb 16;9(2):
pubmed: 33669379
BMC Med Res Methodol. 2018 Oct 20;18(1):112
pubmed: 30342488
BMC Med Res Methodol. 2011 Oct 26;11:144
pubmed: 22029846
J Clin Epidemiol. 2004 Nov;57(11):1196-201
pubmed: 15567637
Diabetes Care. 1998 Feb;21(2):231-5
pubmed: 9539987
Int J Med Inform. 2015 May;84(5):299-307
pubmed: 25683227
BMC Health Serv Res. 2021 Apr 21;21(1):372
pubmed: 33882911
J Health Econ. 1998 Jun;17(3):247-81
pubmed: 10180918
Eur J Pediatr. 2018 Mar;177(3):381-388
pubmed: 29260375
Epidemiol Perspect Innov. 2006 Mar 21;3:3
pubmed: 16551368
J Diabetes Res. 2019 Sep 08;2019:2363292
pubmed: 31583247
PLoS One. 2011;6(11):e27184
pubmed: 22073282

Auteurs

Gustavo A Fernandez (GA)

School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA.

Kristina P Vatcheva (KP)

School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA. Kristina.Vatcheva@utrgv.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH