Should the cost function be zero using TensorFlow's sigmoid_cross_entropy_with_logits?

Question

Should the cost function be zero using TensorFlow's sigmoid_cross_entropy_with_logits?

WilsonPena

2022年5月15日 17:00

I'm building a CNN to make a binary classification (1 or zero). For this, I'm using the cost function sigmoid_cross_entropy_with_logits.

But for some reason, the cost using this function is never equal to zero even if the prediction is equal to the correct valuel.

I tried plotting the output using the formula on TensorFlow's website: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

This formula:

max(x, 0) - x * z + log(1 + exp(-abs(x)))

And by making this plot, I realized that it really isn't zero when the outputs are equal. For example, if z = 0 and x = 0, the result of this function is ~0.693.

This isn't really making sense to me. Can someone shed some light on why it isn't zero when the prediction is correct?

Topic cost-function tensorflow python machine-learning

Category Data Science

Yohanes Alfredo · Accepted Answer · 2019年12月11日 13:28

Here is where you are wrong. Sigmoid crossentropy with logits operate with logits instead of the probability hence the calculated loss is based on the logit before sigmoid function is applied.

Now $x= 0$ here is equivalent to : $$P(Y=1|X;\theta)= \frac{1}{1+e^{-0}} = 0.5$$ For shorter notation I shall denote quantity above as $p$.

Hence for the cross entropy loss for a single sample: $$loss(p)=\mathbb{1}_{y=1}\ln{(p)} + \mathbb{1}_{y=0}\ln{(1-p)}$$ WLOG simply because $P(Y=1|X;\theta)=P(Y=0|X;\theta)=0.5$, We have loss equal to $\ln{0.5}$ which is exactly 0.693 that you get.

hssay · Accepted Answer · 2018年8月16日 17:34

For a binary classification (1 or 0), you should use softmax cross-entropy with logits since the output classes are mutually exclusive / discrete.

The function you used is for cases with multiple outputs (multilabel) all of which can be ones (their documentation for example talks of the case where image can contain both elephant and a dog). The calculation is different than mutually exclusive case.

Should the cost function be zero using TensorFlow's sigmoid_cross_entropy_with_logits?

About