Is shuffling data really necessary for training?
I don't mean if we had a dataset where if sequentially sampled, the labels would be [1111122223333]. In this case, the network learns to predict everything as 1, then 2, and so on and it's impossible to learn.
I mean: Assume you have Imagenet 2012 dataset. You shuffle it once. So now the labels and the images are shuffled. Since the dataset is huge, can the network really remember the previous epoch's predictions and overfit?
OR, I shuffle data 5 times and use each ordering in epochs 1,2,..5, and then at epoch 6 I use the Shuffled data#1 again.
Everybody talks about the importance of shuffling but I never read anything that addresses these problems.
BTW, this question was prompted by me using a database where accessing data sequentially is a lot faster than random access. If I knew that even a pseudo-shuffling helps, it would save me 6-7 hours per training epoch.
Topic randomized-algorithms machine-learning
Category Data Science