Poisson Model (w/ multiple levels X)
Question
Is Poisson model the best method for predicting counts among multiple levels within nominal variable?
Details
Imagine data of 7000 observations, where output= Obs.Count {numeric,0,1,2..8} and features=location {factor, 13 levels} .
When conducting Poisson regression, the output returns:
## function for glm
#p1 - glm(Count ~ Loc,family = poisson, data = dat)
Call:
glm(formula = Count ~ Loc, family = "poisson", data = p1)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49116 -1.32852 0.00775 1.02579 1.55985
Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) 1.112766 0.032880 33.843 0.0000000000000002 ***
LocLocation E -0.006774 0.039251 -0.173 0.863
LocLocation G -0.005369 0.045309 -0.118 0.906
LocLocation H -0.020067 0.039208 -0.512 0.609
LocLocation L -0.018632 0.044483 -0.419 0.675
LocLocation N -0.032309 0.039875 -0.810 0.418
LocLocation O -0.023674 0.044166 -0.536 0.592
LocLocation P -0.015914 0.044296 -0.359 0.719
LocLocation Q 0.019584 0.039434 0.497 0.619
LocLocation R -0.023361 0.039597 -0.590 0.555
LocLocation S 0.003840 0.039222 0.098 0.922
LocLocation U -0.033202 0.040190 -0.826 0.409
LocLocation V -0.007011 0.044393 -0.158 0.875
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 11647 on 6999 degrees of freedom
Residual deviance: 11642 on 6987 degrees of freedom
AIC: 29771
Number of Fisher Scoring iterations: 5
Result of dispersion testing:
Overdispersion test
data: p1
z = 19.391, p-value 0.00000000000000022
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
1.306968
A couple of thoughts:
- If I am reading the output from dispersion correctly, the model is good in this respect (true dispersion is >1).
- The coefficients pvals from the
glm()
seem to point away from any meaningful relationship between locations (based on count. This seems odd as I would expect that, with the count of 7000, there would be a least a couple ofLoc
observations that are related.
Is Poisson the right choice for modeling such data? Given the data is count, I hesitate to apply other methods (albeit, I did apply extensions, such as zero-inflated with similar results). The existing literature is sparse with alternatives to Poisson and even less so when wanting to identify relationships between counts of multiple levels of a categorical (nominal) variable.
Thoughts/suggestions would be welcome.
Topic counts linear-regression predictive-modeling categorical-data
Category Data Science