Horovod vs tf.mirror.strategy

Question

Horovod vs tf.mirror.strategy

vipin bansal

2021年6月17日 07:48

I am exploring the distributed computing using Horovod and Tensorflow.mirror.strategy.

I have a machine which has two GPU's. Using the basic mnist code(provided in tf documents), I tried to utilize both the GPU's for training.

I'm running excatly same piece of code and confused with the usage of H/W resources.

When I'm using Horovod:

[0] GeForce RTX 2080 Ti | 43'C, 13 % | 849 / 11019 MB | vipin(841M) gdm(4M)

[1] GeForce RTX 2080 Ti | 40'C, 14 % | 890 / 11014 MB | vipin(841M) gdm(36M) gdm(8M)

And when using tf.mirror.strategy:

[0] GeForce RTX 2080 Ti | 39'C, 6 % | 10829 / 11019 MB | vipin(10821M) gdm(4M)

[1] GeForce RTX 2080 Ti | 40'C, 7 % | 10826 / 11014 MB | vipin(10777M) gdm(36M) gdm(8M)

I'm confused with the utlization of VRAM (GPU RAM), why Mirror startegy is consuming approax 10829 whereas horvod is using approax 849, even when the batch size is same. Even if I increase or decrease the batch size it remains the same.

Please suggest.

Topic tensorflow distributed

Category Data Science

Horovod vs tf.mirror.strategy

About