Why is cross entropy based on Bernoulli or Multinoulli probability distribution?
        When we use logistic regression, we use cross entropy as the loss function. However, based on my understanding and https://machinelearningmastery.com/cross-entropy-for-machine-learning/, cross entropy evaluates if two or more distributions are similar to each other. And the distributions are assumed to be Bernoulli or Multinoulli. So, my question is: why we can always use cross entropy, i.e., Bernoulli in regression problems? Does the real values and the predicted values always follow such distribution?
    
    Category:
        
        Data Science