Is the Cross Entropy Loss important at all, because at Backpropagation only the Softmax probability and the one hot vector are relevant?

Question

Is the Cross Entropy Loss important at all, because at Backpropagation only the Softmax probability and the one hot vector are relevant?

clemens_m

2022年5月25日 05:02

Is the Cross Entropy Loss (CEL) important at all, because at Backpropagation (BP) only the Softmax (SM) probability and the one hot vector are relevant?

When applying BP, the derivative of CEL is the difference between the output probability (SM) and the one hot encoded vector. For me the CEL output, which is very sophisticated, does not play any roll for learning.

I´m expecting a fallacy in my reasoning, so could somebody please help me out?

Topic softmax backpropagation loss-function deep-learning

Category Data Science

BookYourLuck · Accepted Answer · 2019年8月19日 20:36

The (CE) loss is what you are trying to minimize, it tells you how well your training procedure has fitted your network to your data. In practice you use it for example for comparing performance on training and valid/test set and for early stopping. You shouldn’t use for example accuracy for that as you are not optimizing your network for accuracy. Also the loss often still decreases while accuracy stays constant.

ashukid · Accepted Answer · 2019年7月20日 07:22

During backpropagation we take the derivative of loss function with respect to all the weight parameters. For cross entropy loss this derivative is simply the difference between output probability and predicted probability. Upto this you're correct.

But you missed what loss function represents. The loss function represents the surface manifold of the network. The purpose of learning is to find the minima of this loss function.

We use CE loss because the surface define by it is smooth and continuous, hence we can find the derivative easily.

Mathematically that derivative comes out to be a simple difference, but theoretically we're differentiating the surface defined by the loss function, which is important.

Is the Cross Entropy Loss important at all, because at Backpropagation only the Softmax probability and the one hot vector are relevant?

About