ResNet: Derive the gradient matrices w.r.t. W1 and W2 and backprop equation in a Residual Network

How would I go about step by step deriving stochastic gradient matrices w.r.t. W1 and W2 and backpropagation equation in a residual block that is a part of a larger ResNet network with forward propagation expressed as:

$$ F(x) = \mathrm{W}_{2}^{} \mathrm{g}_{1}^{}(\mathrm{W}_{1}^{}x) $$

$$ y = \mathrm{g}_{2}^{} (F(x) + x) $$

and

$$ \mathrm{g}_{1}^{}, \mathrm{g}_{2}^{} $$

are component-wise non-linear activation functions.

Topic sgd convolutional-neural-network gradient-descent neural-network machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.