MNIST data shape

In going through the different tutorials on CNN, autoencoders, and so on I trained myself on the MNIST problem. The different images are stored in a 3D array which shape is (60000,28,28). In some tutorials for the first layer of CNN they use the Flatten function

keras.layers.Flatten(input_shape=())

but in other tutorials, they transform the 3D Array in A 4D Array (60.000, 28,28,1 ) that I suppose is identical that use the Flatten function? Am I right? Why there are two different approaches to this? Do Keras understand both of them?

Topic mnist cnn keras autoencoder image-classification

Category Data Science


It's because of the approach applied -

CNN - We use 2-D Convolution on the image, Hence we need the image in 2-D. In this case, we use the fully connected neural network at the end. hence flattening is done at the end.


CNN is used to reduce the dimension of the Image without losing the key information. A Simple neural network will become too big to train on image data. Although MNIST data are image but are a bit simple and you can use a simple neural network too. That's why you are seeing both kinds of approach on MNIST. You will not see a simple neural network on big coloured images.

See this excerpt from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurélien Géron -
Why not simply use a deep neural network with fully connected layers for image recognition tasks? Unfortunately, although this works fine for small images (e.g., MNIST), it breaks down for larger images because of the huge number of parameters it requires. For example, a 100 × 100–pixel image has 10,000 pixels, and if the first layer has just 1,000 neurons (which already severely restricts the amount of information transmitted to the next layer), this means a total of 10 million connections. And that’s just the first layer. CNN's solve this problem using partially connected layers and weight sharing.


In simple neural network one row should be one record, so we flatten. Now each pixel is a column.


In the first example there are 60000 images of 2828 which is a 2d grayscale image. But in order to use CNN your images must be 3 dimensinal with height, width and channel as a new dimension. So you have to resize your every 28 * 28 image into 2828*1 image before you can send it into your CNN layers.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.