On the Complexity of Logistic Regression Models.
Journal
Neural computation
ISSN: 1530-888X
Titre abrégé: Neural Comput
Pays: United States
ID NLM: 9426182
Informations de publication
Date de publication:
08 2019
08 2019
Historique:
pubmed:
2
7
2019
medline:
2
7
2019
entrez:
2
7
2019
Statut:
ppublish
Résumé
We investigate the complexity of logistic regression models, which is defined by counting the number of indistinguishable distributions that the model can represent (Balasubramanian, 1997). We find that the complexity of logistic models with binary inputs depends not only on the number of parameters but also on the distribution of inputs in a nontrivial way that standard treatments of complexity do not address. In particular, we observe that correlations among inputs induce effective dependencies among parameters, thus constraining the model and, consequently, reducing its complexity. We derive simple relations for the upper and lower bounds of the complexity. Furthermore, we show analytically that defining the model parameters on a finite support rather than the entire axis decreases the complexity in a manner that critically depends on the size of the domain. Based on our findings, we propose a novel model selection criterion that takes into account the entropy of the input distribution. We test our proposal on the problem of selecting the input variables of a logistic regression model in a Bayesian model selection framework. In our numerical tests, we find that while the reconstruction errors of standard model selection approaches (AIC, BIC,
Identifiants
pubmed: 31260388
doi: 10.1162/neco_a_01207
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM