How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Question

How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Shashank Kumar

2022年2月4日 15:28

Let's say our data is discrete-valued and belongs to one of $K$ classes. The underlying probability distribution is assumed to be a categorical/multinoulli distribution given as $p(\textbf{x}) = \prod_{k = 1}^{K}\mu_{k}^{x_{k}}$ where x is a one-hot vector given as $\textbf{x} = [x_{1} x_{2} ... x_{K}]^{T}$ and $\boldsymbol{\mu} = [\mu_{1} ... \mu_{K} ]^{T}$ are the parameters.

Suppose $D = \{\mathbf{x}_{1}, \text{ } \mathbf{x}_{2}, \text{ } ... ,\text{ }\mathbf{x}_{N}\}$ is our data.

The log likelihood is:

$\log p(D|\boldsymbol{\mu}) = \sum_{k = 1}^{K} m_{k} \log{\mu_{k}}$

where $m_{k} = \sum_{n = 1}^{N} x_{nk}$

To get the MLE solution, we have to solve the following optimization problem:

$\max_{\boldsymbol{\mu}} \sum_{k = 1}^{K} m_{k} \log{\mu_{k}} \hskip 1em \text{such that} \hskip 1em \mu_{k} \geq 0, \hskip 0.5em \sum_{k = 1}^{K} \mu_{k} = 1$

To solve this we write the following Lagrangian.

$L(\boldsymbol{\mu}, \mathbf{u}, v) = \sum_{k = 1}^{K} m_{k} \log{\mu_{k}} - \sum_{k = 1}^{K} u_{k}\mu_{k} + v\left( \sum_{k = 1}^{K}\mu_{k} - 1\right)$ \

The primal problem formulation is then

$\boldsymbol{\hat{\mu}} = \inf_{\boldsymbol{\mu}} \sup_{u_{k} \geq 0, v} L(\boldsymbol{\mu}, \mathbf{u}, v)$

I have no idea how to proceed further. Have no clue how to solve the primal problem.

Topic mathematics theory parameter-estimation optimization categorical-data

Category Data Science

How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

About