Why is cross entropy based on Bernoulli or Multinoulli probability distribution?

When we use logistic regression, we use cross entropy as the loss function. However, based on my understanding and https://machinelearningmastery.com/cross-entropy-for-machine-learning/, cross entropy evaluates if two or more distributions are similar to each other. And the distributions are assumed to be Bernoulli or Multinoulli.

So, my question is: why we can always use cross entropy, i.e., Bernoulli in regression problems? Does the real values and the predicted values always follow such distribution?

Topic logistic cross-entropy bernoulli loss-function regression

Category Data Science


In logistic regression, you assume that each target value follows a Bernoulli distribution - takes on value 1 with some probability $p$, and 0 with probability $1-p$. Your model predicts that the target takes on value 1 with some probability $\hat{p}$, and 0 with probability $1-\hat{p}$. You are in some sense comparing these two distributions, predicted and actual, by using log loss, yes.

There is a 'regression' here; logistic regression is just a generalized linear model with logit link function. You could say you are regressing log-odds (which you then turn into a probability with the inverse link function, the logistic function). Log-odds is assumed to be normally distributed about your predicted mean, sure.

But this does not mean that log loss applies in other regression problems, no. You are not computing log loss on log-odds here, but the probabilities.


Background:
The concept of Cross Entropy is inherited from Information theory where it is applied to understand and measure the difference in the distributions of two or more events. Events as you would appreciate are a discrete concept and translate to classes in the case of a ML classification problems. This is the reason that Cross Entropy is only applicable to Bernoulli/Multinoulli (categorical distributions).

Regarding your question:
It is not clear why you mention Logistic regression and raise a question on the applicability of Cross Entropy (aka LogLoss in case of Logistic regression) to regression problems (the name may have confused you?). Since, Logistic regression is a classification model, all seems to fit well in place.

EDIT 1: If you take a normal distribution (hence, continuous) and discretize it using bins, you convert it into a multinoulli distribution where the area under the curve of individual bins acts as pi of the events/classes. Now you can easily calculate cross entropy for this transformed distribution, however, it is no more a normal distribution.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.