CNN gradients with different magnitude

Question

CNN gradients with different magnitude

aretor

2022年4月11日 20:00

I have a CNN architecture with two cross entropy losses $\mathcal{L}_1$ and $\mathcal{L}_2$ summed in the total loss $\mathcal{L} = \mathcal{L}_1 + \mathcal{L}_2$. The task I want to solve is Unsupervised Domain Adaptation.

I have attested the following behavior:

The gradients coming from $\mathcal{L}_1$ have a different magnitude than those coming from $\mathcal{L}_2$ such that the supervision coming from the first loss is negligible.
$\mathcal{L}_1$ has a positive constant value and does not decrease during the training, while $\mathcal{L}_2$ does decrease.

How can I minimize $\mathcal{L}_1$ and how can I make the gradient from $\mathcal{L}_1$ more important? Currently I have two options:

Add a tradeoff parameter to one of the two losses $\mathcal{L} = \mathcal{L}_1 + \gamma \cdot \mathcal{L}_2$
Normalize the gradients at some step

The last option would be to leave everything as it is, with the motivation that one loss does not provide supervision to the task I want to solve. Do you have some advice on the road to follow?

Topic gradient cnn loss-function machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2020年11月5日 15:31

If you are capable of optimizing the weighting hyperparameter γ, then the relative importance of the loss functions becomes an empirical question. That data will help guide the mixture of losses.

Normalizing the gradients is ad hoc. Thus simplifier but could over simplify the problem.

CNN gradients with different magnitude

About