Should I rescale losses before combining them for multitask learning?
I have a multitask network taking one input and trying to achieve two tasks (with several shared layers, and then separate layers).
One task is multiclass classification using the CrossEntropy loss, the other is sequence recognition using the CTC loss.
I want to use a combination of the two losses as criterion, something like Loss = λCE + (1-λ)CTC. The thing is that my CE loss starts around 2 while the CTC loss is in the 400s.
Should I rescale the losses at each epoch with a Max(L₁)/L₁ factor, where Max(L₁) is the maximal loss at epoch 1 and L₁ is each “sub-loss” at epoch 1. That is we scale the loss so that at the first epoch they have the same magnitude and then we keep scaling using those factors.
Is there a better approach? How do I ensure that my two losses have the same influence on the backpropagation with respect to λ?