The effects of Double Logarithms (Log Cross Entropy Loss) + Overfitting

Question

The effects of Double Logarithms (Log Cross Entropy Loss) + Overfitting

Bun

2022年1月30日 08:54

My network involves two losses: one is a binary cross entropy, and the other is a multi-label cross entropy.

The yellow graphs are the ones with double logarithm, meaning that we log(sum(ce_loss)). The red pink graphs are the ones with just sum(ce_loss).

The dash lines represent validation step. The solid lines represent training step.

The top yellow and top red-pink figures both represent the count of 1s. Both are supposed to converge to 30. It is clear that the top yellow figure demonstrate that both training and validation converge to 30. While the bottom red-pink figure demonstrate that only training converges to 30, while validation converges to 0.... !

My questions are:

I have not been able to find any literature whatsoever that anyone ever used a double logarithm (referencing yellow graphs). But the results are clearly much better than just a typical cross entropy loss (referencing red graphs).

Does anyone know why adding a logarithm on a cross entropy would improve the results? My original purpose of adding the outer logarithm on a CE loss was to increase computational stability. And it does seems to serve my purpose.

The yellow graphs have high training and validation accuracy (due to expected counts of thirty 1s), though the validation loss (middle graph) is increasing. Is this a case of overfitting?
The red graphs have high training accuracy yet poor validation accuracy. The validation loss (middle and right graph) for both losses are increasing. Is this also a case of overfitting?

My colleague questioning the usage of double logarithm, but clearly it seems that double logarithm is performing better than without the second(outer) logarithm.

Any advice and suggestions would be great! Thank you.

Topic cross-entropy loss-function accuracy neural-network

Category Data Science

The effects of Double Logarithms (Log Cross Entropy Loss) + Overfitting

About