How to build Generative Model when we have more than one variable

I have a data-frame which has looks similar to this:

A   B   C
1   2   2
2   4   3
4   8   5
9   16  7
16  32  11
22  43  14
28  55  17
34  67  20
40  79  23

A,B and C can be assumed to be the features in machine learning literature. I have read maximum likelihood estimation for 1 variable assuming Gaussian distribution.

The equation is something like, where xi's are each data-point:

Where x1,x2....xn are n data points each having dimension 3. If we assume p(x) to be gaussian, then we can use the Gaussian Normal distribution equation as:

This is well understandable if we have only 1 feature.

How can I generalise the above normal distribution equation when we have more than 1 features, here we have 3 features? Can someone help me to write the maximum likelihood for the above data-frame?

Do we learn mu and sigma for each features A,B and C that is total of 6 learnable parameters?

If we have 3 different distribution say Normal, exponential and so on for columns A,B and C then how does the MLE equation looks like over entire-data-frame?

If we do argmax of equation 1, we don't require the ground truth for it. right? We are just maximising the equation?

Topic generative-models machine-learning

Category Data Science


First, I'd like to clarify. The maximum likelihood function you gave there is NOT with respect to "one feature". Or at the very least, it's not meaningful to think of this as a "feature" because in your example, you would actually evaluate the loglikelihood function at observed values of y not x. You aren't using your variables in x to explain anything about y. The resulting maximum likelihood estimate you would get for $\mu$ would be $\bar x$, the sample mean of your target variable y.

Now, suppose you actually do want to use x (your features) to predict y as in the case of supervised learning. Then, as I alluded to in your comments you need to specify two things - the predictor or model function, denoted typically as $\hat f(X)$ and the "link function", denoted as $g$.

$\hat f(X)$ is a function of your predictor variables such that:

$$g(E[Y|X]) = \hat f(X)$$

In your case, $E[Y | X] = \mu $ since you have a Normal distribution. Hence,

$$g(\mu) = \hat f(X) \rightarrow \mu = g^{-1}(\hat f(X))$$

Now, in terms of choosing $\hat f(X) $, this depends on your goals and how complicated you wish to go. Regardless, this is a function that can take on any real number. In the standard case (say in linear regression) you set $\hat f(X) = B_{0} + B_{1}X_{1} + B_{2}X_{2} + B_{3}X_{3}.$ There are other examples where writing out $\hat f(X)$ is impossible or tiresome, for example in the case of gradient boosting trees or deep neural networks. Other algorithms may set $\hat f(X) = B_{0} + h_1(x_{1}) + h_2(x_{2}) + h_3(x_{3})$ where $h_{i}$ are smooth functions. It really depends on again how complicated you wish to get and how interpretable you need your models to be.

With respect to g, the "link function", this is almost always chosen depending on the range of the response variable or the range of the parameter that you are linking with ($\mu$ in your case). In your case, since the normal distribution can take on any real number, most of the time $g$ is chosen as the identity function since $\hat f(X)$ naturally can take on any real number already. This leads to:

$$\mu = B_{0} + B_{1}X_{1} + B_{2}X_{2} + B_{3}X_{3}.$$

Finally, when dealing with your likelihood function:

$$p(Y) = constant * \prod_{i = 1}^{n} e^{\frac{-1}{2}\frac{(y_{i} - (B_{0} + B_{1}X_{1} + B_{2}X_{2} + B_{3}X_{3}))^2}{\sigma^2}} $$

Solving this equation will provide you with the ordinary least squares estimates which I am sure you have seen before. Of course, choosing a different $\hat f(X)$ or a different $g$ will likely change everything and often you may not even get nice-looking closed-form solutions that come out with ordinary least squares. This motivates different numerical optimization methods. However, the "ingredients" are the same.

I hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.