Training using a dataset on NAS server
I have 2 questions: how would you approach the storage of datasets (with millions of small files) on local network? And how would you take that into account in pytorch code?
Hello, I need to store large datasets (could be in the TB) on local network. However, training using a dataset on the 3 different NAS servers I tested on was consistently 4 time longer while GPU usage was 25% on average, I guess that's because the GPU isn't fed data fast enough. Playing with num_workers and batch_size wan't making a noticeable difference.
Thank you very much!