Speed of training decrease by adding more GPUs
I am using the distributed Tensorflow with Mirror Strategy. I am training the VGG16 based on custom Estimator. However, by increasing the number of GPUs time of training is increased. As I check, the GPUs Utilization is about 100% and it seems the input function can feed data to GPUs. As all GPUs are in a single machine, Is there any clue to found out the problem. This is the computation graph and I am wondreing the Groups_Deps cause the problem.
Topic gpu tensorflow distributed
Category Data Science