ResNet: Derive the gradient matrices w.r.t. W1 and W2 and backprop equation in a Residual Network
How would I go about step by step deriving stochastic gradient matrices w.r.t. W1 and W2 and backpropagation equation in a residual block that is a part of a larger ResNet network with forward propagation expressed as:
$$ F(x) = \mathrm{W}_{2}^{} \mathrm{g}_{1}^{}(\mathrm{W}_{1}^{}x) $$
$$ y = \mathrm{g}_{2}^{} (F(x) + x) $$
and
$$ \mathrm{g}_{1}^{}, \mathrm{g}_{2}^{} $$
are component-wise non-linear activation functions.