How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Let's say our data is discrete-valued and belongs to one of $K$ classes. The underlying probability distribution is assumed to be a categorical/multinoulli distribution given as $p(\textbf{x}) = \prod_{k = 1}^{K}\mu_{k}^{x_{k}}$ where x is a one-hot vector given as $\textbf{x} = [x_{1} x_{2} ... x_{K}]^{T}$ and $\boldsymbol{\mu} = [\mu_{1} ... \mu_{K} ]^{T}$ are the parameters.

Suppose $D = \{\mathbf{x}_{1}, \text{ } \mathbf{x}_{2}, \text{ } ... ,\text{ }\mathbf{x}_{N}\}$ is our data.



The log likelihood is:

$\log p(D|\boldsymbol{\mu}) = \sum_{k = 1}^{K} m_{k} \log{\mu_{k}}$

where $m_{k} = \sum_{n = 1}^{N} x_{nk}$

To get the MLE solution, we have to solve the following optimization problem:

$\max_{\boldsymbol{\mu}} \sum_{k = 1}^{K} m_{k} \log{\mu_{k}} \hskip 1em \text{such that} \hskip 1em \mu_{k} \geq 0, \hskip 0.5em \sum_{k = 1}^{K} \mu_{k} = 1$

To solve this we write the following Lagrangian.

$L(\boldsymbol{\mu}, \mathbf{u}, v) = \sum_{k = 1}^{K} m_{k} \log{\mu_{k}} - \sum_{k = 1}^{K} u_{k}\mu_{k} + v\left( \sum_{k = 1}^{K}\mu_{k} - 1\right)$ \

The primal problem formulation is then

$\boldsymbol{\hat{\mu}} = \inf_{\boldsymbol{\mu}} \sup_{u_{k} \geq 0, v} L(\boldsymbol{\mu}, \mathbf{u}, v)$

I have no idea how to proceed further. Have no clue how to solve the primal problem.

Topic mathematics theory parameter-estimation optimization categorical-data

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.