Learning with dirichlet prior - probabilistic graphical models exercise

I have the following problem: Suppose we are interested in estimating the distribution over the English letters. We assume an alphabet that consists of 26 letters and the space symbol, and we ignore all other punctuation and the upper/lower case disctinction. We model the distribution over the 27 symbols as a multinomial parametrized by $\theta =(\theta_1,...,\theta_{27})$ where $\sum_i \theta_i = 1$ and all $\theta_i \geq 0$/

Now we go to Stanford's Green Library and repeat the following experiment: randomly pick up a book, open a page, pick a spot on the page, and write down the nearest symbol that is in our alphabet. We use $X[m]$ to denote the letter we obtain in the $mth$ experiment.

In the end we have collected a databse $D=\{x[1],...,x[2000] \}$ consisting of 2000 symbols, among which "a" appears 100times and "p" 87 times. We use a Dirichlet prior over $\theta$, i.e. $P(\theta)=Dirichlet(\alpha_1,...,\alpha_{27})$, where each $\alpha_i=10$.

Suppose we draw two more samples, X[2001] and X[2002]. If we use $\alpha_i=10$ for all $i$, what is the probability of $P(X[2001]="p",X[2002]="a"|D)$?

I thought we could compute: $P(x[2001]=p|D) \times P(x[2002]=a|D)=\frac{10+87}{270+2001} \times \frac{10+100}{270+2002}$ but it's wrong.

Recall formula : $P(x|u,D)=\frac{\alpha_{x,u}+M[x,u]}{\alpha_u+M[u]}$

Topic graphical-model machine-learning

Category Data Science


Your formula is correct, but the final computing is wrong. It should be: $\frac{10+87}{270+2000} \times \frac{10+100}{270+2001}$

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.