How is image convolution actually implemented in deep learning libraries using simple linear algebra?

As a clarifier, I want to implement cross-correlation, but the machine learning literature keeps referring to it as convolution so I will stick with it.

I am trying to implement image convolution using linear algebra. After looking around on the internet and thinking about it, I could come up with two possible solutions for that.

The first one: Create an appropriate Toeplitz-like matrix out of the kernel as it is described here.

The second one: Instead of the filter, modify the input in a way that it will store the values used by specific kernel operations in column vectors. Then the kernel -layed out in a row vector- should be multiplied by the matrix that holds the modified input like so

My question: Lets assume I have a grayscale image of 960x720 pixels and a kernel of size 3x3.

In the first solution that would create a massive circulant matrix out of the kernel (with a size of 960 * 720 * 958 * 718, in the case of stride=1), with about 10^11 elements.

In the second solution, the modified input would have about 10^6 elements (958 * 718 * kernel-size).

I think it is clear that the second option is so much quicker to compute, am I missing something, did I miss calculate something? The literature keeps referring to Toeplitz matrices when it comes to this problem but I just can not see the second solution to have anything in common with them. Does the matrix in the second one have a specific name?

Is the second one the way to go if I want to implement image convolution? Is there a more efficient solution or a mathematical trick perhaps?

Topic matrix linear-algebra convolutional-neural-network convolution

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.