Confusion with L2 Regularization in Back-propagation
In a very simple language, this is L2 regularization
$\hspace{3cm}$$Loss_R$ = $Loss_N + \sum w_i^2$
$Loss_N$ - Loss without regularization
$Loss_R$ - Loss with regularization
When implementing [Ref], we simply add the derivative of the new penaty to the current delta weight,
$\hspace{3cm}$$dw = dw_N + constant*w$
$dw_N$ - Weight delta without regularization
What I think - L2 regularization is achieved with the last step only i.e. the weight is penalized.
My question is -
Why do we then add the loss in the total loss as done in the first equation. Will, it not put an additional penalty during back-prop( On $dw_N$ component) for each weight because of increased Loss. I can understand if it is for console print purpose but it is not, I believe.
I know I am missing something very simple.
Topic mathematics regularization backpropagation
Category Data Science