How to pass a sequence of 4 images into LSTM and CNN-LSTM

I got an assignment and stuck with it while going down the rabbit hole of learning PyTorch, LSTM, and CNN.

Provided the well-known MNIST library I take combinations of 4 numbers and per combination, it falls down into one of 7 labels.

eg:

1111 label 1 (follow a constant trend) 1234 label 2 increasing trend 4321 label 3 decreasing trend ... 7382 label 7 decreasing trend - increasing trend - decreasing trend

The shape of my tensor after loading of the tensor becomes (3,4,28,28) where the 28 comes from the MNIST image's width and height. 3 is the batch size and 4 is the channels (4 images).

I'm somewhat stuck with how to pass this into a PyTorch-backed LSTM and CNN as basically all Google searches lead to articles where simply one image is passed in.

I was thinking of reshaping it to 1 long array of (pixel values) where I put all of the values of the first image row by row (28) after each other, then appended by the same approach for the second, third and fourth image. So that would make 4 * 28 * 28 = 3136.

Is my way of thinking on how to tackle this a correct one or should I rethink it? I'm rather new to this all and looking for some guidance on how to go forward. I've been reading loads of articles, YT videos, ... but all seem to touch the basic stuff or alternatives of the same subject.

Topic mnist pytorch cnn lstm

Category Data Science


CNN-LSTM are used in text recognition. How is text structured? Characters are stacking in a horizontal manner. Do the same with your MNIST images and for the ground truth make proper labels (should be of 4 size).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.