Training using a dataset on NAS server

Adam Katav

2022年4月27日 07:17

I have 2 questions: how would you approach the storage of datasets (with millions of small files) on local network? And how would you take that into account in pytorch code?

Hello, I need to store large datasets (could be in the TB) on local network. However, training using a dataset on the 3 different NAS servers I tested on was consistently 4 time longer while GPU usage was 25% on average, I guess that's because the GPU isn't fed data fast enough. Playing with num_workers and batch_size wan't making a noticeable difference.

Thank you very much!

Topic hardware pytorch linux dataset python

Category Data Science

Training using a dataset on NAS server

About