Poisson Model (w/ multiple levels X)

Question

Is Poisson model the best method for predicting counts among multiple levels within nominal variable?

Details
Imagine data of 7000 observations, where output= Obs.Count {numeric,0,1,2..8} and features=location {factor, 13 levels} .

When conducting Poisson regression, the output returns:


## function for glm
#p1 - glm(Count ~ Loc,family = poisson, data = dat)

Call:
glm(formula = Count ~ Loc, family = "poisson", data = p1)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.49116  -1.32852   0.00775   1.02579   1.55985  

Coefficients:
               Estimate Std. Error z value            Pr(|z|)    
(Intercept)    1.112766   0.032880  33.843 0.0000000000000002 ***
LocLocation E -0.006774   0.039251  -0.173               0.863    
LocLocation G -0.005369   0.045309  -0.118               0.906    
LocLocation H -0.020067   0.039208  -0.512               0.609    
LocLocation L -0.018632   0.044483  -0.419               0.675    
LocLocation N -0.032309   0.039875  -0.810               0.418    
LocLocation O -0.023674   0.044166  -0.536               0.592    
LocLocation P -0.015914   0.044296  -0.359               0.719    
LocLocation Q  0.019584   0.039434   0.497               0.619    
LocLocation R -0.023361   0.039597  -0.590               0.555    
LocLocation S  0.003840   0.039222   0.098               0.922    
LocLocation U -0.033202   0.040190  -0.826               0.409    
LocLocation V -0.007011   0.044393  -0.158               0.875    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 11647  on 6999  degrees of freedom
Residual deviance: 11642  on 6987  degrees of freedom
AIC: 29771

Number of Fisher Scoring iterations: 5

Result of dispersion testing:

    Overdispersion test

data:  p1
z = 19.391, p-value  0.00000000000000022
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
  1.306968 

A couple of thoughts:

  1. If I am reading the output from dispersion correctly, the model is good in this respect (true dispersion is >1).
  2. The coefficients pvals from the glm() seem to point away from any meaningful relationship between locations (based on count. This seems odd as I would expect that, with the count of 7000, there would be a least a couple of Loc observations that are related.

Is Poisson the right choice for modeling such data? Given the data is count, I hesitate to apply other methods (albeit, I did apply extensions, such as zero-inflated with similar results). The existing literature is sparse with alternatives to Poisson and even less so when wanting to identify relationships between counts of multiple levels of a categorical (nominal) variable.

Thoughts/suggestions would be welcome.

Topic counts linear-regression predictive-modeling categorical-data

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.