Reducing the GPU memory usage when the model is already small enough
I trained a model and froze it into a PB (protocol buffer) file and a directory of some variables, and the total size is about 31M. We deployed it using a GPU card and followed this answer and set the per_process_gpu_memory_fraction
to a very little number to make the memory to be about 40M. The program performs very well but when we check the GPU usage by nvidia-smi
which shows that the memory usage is about 500M.
Then my question is how can I justify this gap? How can we reduce that? Can we do something like quantization to decrease the 500M? We want to deploy it into an edge device so the 500M is too large.
Topic hardware gpu tensorflow
Category Data Science