Horovod vs tf.mirror.strategy
I am exploring the distributed computing using Horovod and Tensorflow.mirror.strategy.
I have a machine which has two GPU's. Using the basic mnist code(provided in tf documents), I tried to utilize both the GPU's for training.
I'm running excatly same piece of code and confused with the usage of H/W resources.
When I'm using Horovod:
[0] GeForce RTX 2080 Ti | 43'C, 13 % | 849 / 11019 MB | vipin(841M) gdm(4M)
[1] GeForce RTX 2080 Ti | 40'C, 14 % | 890 / 11014 MB | vipin(841M) gdm(36M) gdm(8M)
And when using tf.mirror.strategy:
[0] GeForce RTX 2080 Ti | 39'C, 6 % | 10829 / 11019 MB | vipin(10821M) gdm(4M)
[1] GeForce RTX 2080 Ti | 40'C, 7 % | 10826 / 11014 MB | vipin(10777M) gdm(36M) gdm(8M)
I'm confused with the utlization of VRAM (GPU RAM), why Mirror startegy is consuming approax 10829 whereas horvod is using approax 849, even when the batch size is same. Even if I increase or decrease the batch size it remains the same.
Please suggest.
Topic tensorflow distributed
Category Data Science