Distributed training with low level Tensorflow API
I am using low level Tensorflow API's for my model training. When I say low level it means I'm defining the tf.Session()
object of the graph and evaluate graph with in this session.
I would like to distribute the model training using tf.distribute.MirroredStrategy()
.
I am able to use mirroredstrategy()
on tensorflow sequential API's using the example shared by tensorflow in their document.
But I am facing difficulty in executing tf low level code using mirror strategy.
I tried to use distribute.MirrorStrategy()
and below are the results of resource utilization:
[0] GeForce RTX 2080 Ti | 48'C, 40 % | 10771 / 11019 MB | vipin(10763M) gdm(4M)
[1] GeForce RTX 2080 Ti | 37'C, 0 % | 10376 / 11014 MB | vipin(10327M) gdm(36M) gdm(8M)
Even though model used the memory of both the GPU's, but still GPU1 utilization is 0.
I am not sure about the issue. Even not sure if tensorflow support this.
Please clear my doubts and if possible share the sample code as well.
Topic tensorflow distributed
Category Data Science