How is image convolution actually implemented in deep learning libraries using simple linear algebra?
As a clarifier, I want to implement cross-correlation, but the machine learning literature keeps referring to it as convolution so I will stick with it.
I am trying to implement image convolution using linear algebra. After looking around on the internet and thinking about it, I could come up with two possible solutions for that.
The first one: Create an appropriate Toeplitz-like matrix out of the kernel as it is described here.
The second one: Instead of the filter, modify the input in a way that it will store the values used by specific kernel operations in column vectors. Then the kernel -layed out in a row vector- should be multiplied by the matrix that holds the modified input like so
My question: Lets assume I have a grayscale image of 960x720 pixels and a kernel of size 3x3.
In the first solution that would create a massive circulant matrix out of the kernel (with a size of 960 * 720 * 958 * 718, in the case of stride=1), with about 10^11 elements.
In the second solution, the modified input would have about 10^6 elements (958 * 718 * kernel-size).
I think it is clear that the second option is so much quicker to compute, am I missing something, did I miss calculate something? The literature keeps referring to Toeplitz matrices when it comes to this problem but I just can not see the second solution to have anything in common with them. Does the matrix in the second one have a specific name?
Is the second one the way to go if I want to implement image convolution? Is there a more efficient solution or a mathematical trick perhaps?
Topic matrix linear-algebra convolutional-neural-network convolution
Category Data Science