How propagate the error delta in backpropagation in convolutional neural networks (CNN)?

My CNN has the following structure:

  • Output neurons: 10
  • Input matrix (I): 28x28
  • Convolutional layer (C): 3 feature maps with a 5x5 kernel (output dimension is 3x24x24)
  • Max pooling layer (MP): size 2x2 (ouput dimension is 3x12x12)
  • Fully connected layer (FC): 432x10 (3*12*12=432 max pooling layer flattened and vectorized)

After making the forward pass, I calculate the error delta in the output layer as:

$\delta^L = (a^L-y) \odot \sigma'(z^L) (1)$

Being $a^L$ the predicted value and $z^L$ the dot product of the weights, plus the biases.

I calculate the error deltas for the next layers with:

$\delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l) (2)$

And derivative of the error w.r.t. the weights being

$\frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k \delta^l_j (3)$

I'm able to update the weights (and biases) of $FC$ with no problem. At this point, error delta $\delta$ is 10x1.

For calculating the error delta for $MP$ , I find the dot product of $FC$ and the error delta itself, as defined in equation 2. That gives me an error delta of 432x1. Because there are no parameters in this layer, and the flattening and vectorization, I just need to follow the reverse process and reshape it to 3x12x12, being that the error in $MP$.

To find the error delta for $C$, I upsample the error delta following the reverse process of the max pooling ending with a 3x24x24 delta. Finding the hadamard product of each of those matrixes with each of the $σ′$ of the feature maps gives me the error delta for $C$.

But now, how am I supposed to update the kernels, if they're 5x5, and I is 28x28? $I$ have the error delta for the layer, but I don't know how to update the weights with it. Also for the bias, as it's a single value for the whole feature set.

Topic mathematics deep-learning neural-network

Category Data Science

"To find the error delta for C, I upsample the error delta following the reverse process of the max pooling ending with a 3x24x24 delta. Finding the hadamard product of each of those matrixes with each of the σ' of the feature maps gives me the error delta for C."

why are you doing the upsampling, I just don't understand here. Your initial input is of size 28x28 after running 3 kernels of 5x5 you get 3x24x24, and after that, you do max pooling with stride 2 so your output is 3x12x12, now you flatten it and output to 10 neurons in the output layer, so what are your weight matrices here you have two weight tensors 1)the convolution part gives your a 3x5x5 weight tensor and FC gives 10x432, pooling layers don't have weights, so skip that part in your gradient calculation and your Jacobian shape should match the shape of convolution part i.e. 3x5x5. while implementing it treat each kernel as a layer of neural network and then do the calculation for all three in parallel.

for details please refer to this tutorial -

So you are correct that the principle of backpropagation is to do the reverse of the operations. The same is true about the convolutional layer.

The forward pass of the convolutional layer can be expressed by

$x_{i, j}^l = \sum_m \sum_n w_{m,n}^l o_{i+m, j+n}^{l-1} + b_{i, j}^l$.

Where $m$ and $n$ is the shape of the convolutional kernel that you will pass over your input image and $w$ is the associated weight for that kernel. $o$ is the input features and $x$ is the resulting value represented by their respective layers $l-1$ and $l$.

For backpropagation we will want to compute $\frac{\partial x}{\partial w}$.

$\frac{\partial x^l_{i, j}}{\partial w^l_{m', n'}} = \frac{\partial}{\partial w^l_{m', n'}} (\sum_m \sum_n w_{m,n}^l o_{i+m, j+n}^{l-1} + b_{i, j}^l)$.

By expanding the summation we end up observing that the derivative will only be non-zero when $m=m'$ and $n=n'$. We then get

$\frac{\partial x^l_{i, j}}{\partial w^l_{m', n'}} = o^{l-1}_{i+m', j+n'}$.

We can then put this result into the overall error term we have calculated.


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.