How to interpreter Binary Cross Entropy loss function?

I saw some examples of Autoencoders (on images) which use sigmoid as output layer and BinaryCrossentropy as loss function.

The input to the Autoencoders is normalized [0..1] The sigmoid outputs values (value of each pixel of the image) [0..1]

I tried to evaluate the output of BinaryCrossentropy and I'm confused.

Assume for simplicity we have image [2x2] and we run Autoencoder and get 2 results. One result is close to the True value and the second is same as the true value:

import numpy as np
import tensorflow as tf

bce = tf.keras.losses.BinaryCrossentropy()

y_true = [0.5, 0.3, 0.5, 0.9]
y_pred = [0.1, 0.3, 0.5, 0.8]
print(bce(y_true, y_pred).numpy())

y_pred = [0.5, 0.3, 0.5, 0.9]
print(bce(y_true, y_pred).numpy())

Results:

0.71743906
0.5805602

As you can see, the second example (which is the same as the true value) gets low score (low loss value, but still it's not 0 or close to 0).

It seems that

It seems that using BinaryCrossentropy as loss function won't give us the best results. (We never get values close to zero) ?

Does the best value will be close to 0.5 ?

What am I missing ?

Topic sigmoid autoencoder loss-function deep-learning machine-learning

Category Data Science


Binary cross entropy is intended to be used with data that take values in $\{0,1\}$ (hence binary). The loss function is given by, $$ \mathcal{L}_n = - \left[ y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right]$$ for a single sample $n$ (taken from Pytorch documentation) where $\sigma(x_n)$ is the predicted output.

For $y_n=0$ or $y_n=1$, the loss function as a function of $\sigma(x_n)$ is only 0 if $\sigma(x_n)=0$ or $\sigma(x_n)=1$, as you can see in the plot below. And although it's not what is intended with the binary cross entropy loss, you could in principle have a target value of $y_n=0.5$, and the loss would reach its minimum at $\sigma(x_n)=0.5$, though the loss would not be equal to 0.

In the plot below I show the loss function $\mathcal{L}(\sigma(x_n))$ for various values of the target $y_n$: enter image description here


Binary cross entropy loss assumes that the values you are trying to predict are either 0 and 1, and not continuous between 0 and 1 as in your example. Because of this even if the predicted values are equal to the actual values your loss will not be equal to 0. Using values of either 0 or 1 does return a loss of zero in the case that the predicted values equal the true values:

import torch
from torch.nn import BCELoss

loss = BCELoss()

true = torch.Tensor([0.5, 0.3, 0.5, 0.9])
pred = torch.Tensor([0.5, 0.3, 0.5, 0.9])

loss(true, pred)
# tensor(0.5806)

true = torch.Tensor([1, 0, 1, 1])
pred = torch.Tensor([1, 0, 1, 1])

loss(true, pred)
# tensor(0.)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.