Speed of training decrease by adding more GPUs

I am using the distributed Tensorflow with Mirror Strategy. I am training the VGG16 based on custom Estimator. However, by increasing the number of GPUs time of training is increased. As I check, the GPUs Utilization is about 100% and it seems the input function can feed data to GPUs. As all GPUs are in a single machine, Is there any clue to found out the problem. This is the computation graph and I am wondreing the Groups_Deps cause the problem.

Topic gpu tensorflow distributed

Category Data Science


Using GPUs can accelerate training but as you increase the number of GPUs your training as to be distributed which means your data has to be moved to several GPUs which can cost in term of bandwidth. I would profile the training and see the time spent to move the data to GPUs and et results back. Plus it's harder to syncronize training like this.

If you are using tensorflow 1.14+ try to change the distribution method to "MirroredStrategy" . This tend to work better with multiple GPUs I found.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.