How is image convolution actually implemented in deep learning libraries using simple linear algebra?

Question

How is image convolution actually implemented in deep learning libraries using simple linear algebra?

Jozef Nagy

2021年12月10日 11:39

As a clarifier, I want to implement cross-correlation, but the machine learning literature keeps referring to it as convolution so I will stick with it.

I am trying to implement image convolution using linear algebra. After looking around on the internet and thinking about it, I could come up with two possible solutions for that.

The first one: Create an appropriate Toeplitz-like matrix out of the kernel as it is described here.

The second one: Instead of the filter, modify the input in a way that it will store the values used by specific kernel operations in column vectors. Then the kernel -layed out in a row vector- should be multiplied by the matrix that holds the modified input like so

My question: Lets assume I have a grayscale image of 960x720 pixels and a kernel of size 3x3.

In the first solution that would create a massive circulant matrix out of the kernel (with a size of 960 * 720 * 958 * 718, in the case of stride=1), with about 10^11 elements.

In the second solution, the modified input would have about 10^6 elements (958 * 718 * kernel-size).

I think it is clear that the second option is so much quicker to compute, am I missing something, did I miss calculate something? The literature keeps referring to Toeplitz matrices when it comes to this problem but I just can not see the second solution to have anything in common with them. Does the matrix in the second one have a specific name?

Is the second one the way to go if I want to implement image convolution? Is there a more efficient solution or a mathematical trick perhaps?

Topic matrix linear-algebra convolutional-neural-network convolution

Category Data Science

How is image convolution actually implemented in deep learning libraries using simple linear algebra?

About