Why can distributed deep learning provide higher accuracy (lower error) than non-distributed one with the following cases?
Based on some papers which I read, distributed deep learning can provide faster training time. In addition, it also provides better accuracy or lower prediction error. What are the reasons?
Question edited:
I am using Tensorflow to run distributed deep learning (DL) and compare the performance with non-distributed DL. I use the number of dataset 1000 samples and step size 10000. The distributed DL uses 2 workers and 1 parameter server. Then, the following cases are considered when running the code:
Each worker and non-distributed DL use 1000 samples for training sets, same mini-batch size 200
Each worker uses 500 samples for training sets (first 500 samples for worker 1 and the rest 500 samples for worker 2), non-distributed DL use 1000 samples for training sets, same mini-batch size 200
Each worker uses 500 samples for training sets (first 500 samples for worker 1 and the rest 500 samples for worker 2) with mini-batch size 100, non-distributed DL use 1000 samples for training sets with mini-batch size 200
Based on the simulation, for all cases, distributed DL has lower RMSE than non-distributed DL. In this case, the RMSEs of distributed DL are as follows: Distributed DL in Case 2 Distributed DL in Case 1 Distributed DL in Case 3 Non-distributed.
In addition, I also add the training time (i.e., the number of steps is 2 x 10000) for non-distributed DL, the results are still not as good as distributed DL.
One reason can be the mini-batch size, however, I wonder the other reasons why the distributed DL has better performance using the aforementioned cases?
Topic tensorflow deep-learning distributed
Category Data Science