Penalized logistic regression with low prevalence exposures beyond high dimensional settings.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2019
2019
Historique:
received:
30
10
2018
accepted:
05
05
2019
entrez:
21
5
2019
pubmed:
21
5
2019
medline:
6
2
2020
Statut:
epublish
Résumé
Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques.
Identifiants
pubmed: 31107924
doi: 10.1371/journal.pone.0217057
pii: PONE-D-18-31200
pmc: PMC6527211
doi:
Substances chimiques
Pharmaceutical Preparations
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0217057Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Int J Epidemiol. 2013 Feb;42(1):308-17
pubmed: 23230299
Int J Cancer. 1991 Sep 30;49(3):335-40
pubmed: 1655658
Epidemiology. 2012 Sep;23(5):706-12
pubmed: 22766751
Stat Med. 2014 Dec 30;33(30):5413-32
pubmed: 25074480
Am J Epidemiol. 2018 Apr 1;187(4):864-870
pubmed: 29020135
Epidemiology. 2017 Mar;28(2):249-257
pubmed: 27922533
J Postgrad Med. 2016 Jan-Mar;62(1):26-31
pubmed: 26732193
Stat Med. 2006 Dec 30;25(24):4216-26
pubmed: 16955543
PLoS Med. 2010 Nov 16;7(11):e1000366
pubmed: 21125020
Methods Inf Med. 2014;53(6):419-27
pubmed: 25112367
Am J Epidemiol. 2014 Jan 15;179(2):252-60
pubmed: 24173548
Int J Epidemiol. 1999 Aug;28(4):631-9
pubmed: 10480689
Biom J. 2018 May;60(3):431-449
pubmed: 29292533
Bioinformatics. 2012 May 15;28(10):1368-75
pubmed: 22467913
BMC Med Res Methodol. 2016 Aug 22;16:103
pubmed: 27549803
Biometrics. 2006 Dec;62(4):961-71
pubmed: 17156269
Drug Saf. 2002;25(9):677-87
pubmed: 12137561
Stat Med. 2002 Aug 30;21(16):2409-19
pubmed: 12210625