Poisson model with overdisperssion
I'm working with a dataset $X$ (of length $N$) of count data, which looks like:
I developed a statistical model which can be improved, so I'm asking for any suggestions, for instance, differnet likelihoods or prior selection, different approach, anything...
My model
I'm trying to get the parameters of the likelihood of the data, so thaht I can get a posterior predictive density function, credible intervals and so on. Let's say, I want to model the generative process of the data given some parameters, $f(X|\theta)$
This data shows a large overdispersion ($\bar X var(X)$), thus a Poisson likelihood, $f(X|\lambda) \sim \mathcal{Poisson}(\lambda)$, is not a good choice.
Reading literature about count data with overdisperssion, I decided to model $f(X|\lambda)$ as a Negative Binomial distribution, thus $f(X|\lambda) \sim \mathcal NB(r, p)$
Parameter estimation
In order to not to end up with a very complex set-up, I've performed bayesian estiamtion of the hyperameter $p$, letting $r$ be computed from the data: in a Neagative Binomial distribution, $r$ is related to the first and second moments of the distribution following:
$ r = \frac{\mu^2}{\sigma^2 - \mu}, \text then $
$ \hat r = \frac{\bar X^2}{var(X) - \bar X} $
The whole set-up is:
- Likelihood: $f(X|p) = \mathcal NB(\hat r, p)$
- Prior: $f(p) = \mathcal Beta (0, 0)$ (non informative, improper prior)
- Posterior: $f(p|X) = \mathcal Beta (0 + \hat rN, 0 + \sum X)$
which returned the following posterior predictive distribution:
The first and second moments of the predictive posterior distribution are very close to those in the data (I've let the data have a huge impact in the posteriors since I've choosen a non-informative prior). Also, the point estimate posterior predictive (using $\mu_p$) does not differ from an averaged predictive posterior distribution over all possible values of $p$.
Once again, any suggestions for improvement?
EDIT
What about a zero-truncated negative binomial distribution?
Topic poisson bayesian predictive-modeling
Category Data Science