Reducing the GPU memory usage when the model is already small enough

I trained a model and froze it into a PB (protocol buffer) file and a directory of some variables, and the total size is about 31M. We deployed it using a GPU card and followed this answer and set the per_process_gpu_memory_fraction to a very little number to make the memory to be about 40M. The program performs very well but when we check the GPU usage by nvidia-smi which shows that the memory usage is about 500M.

Then my question is how can I justify this gap? How can we reduce that? Can we do something like quantization to decrease the 500M? We want to deploy it into an edge device so the 500M is too large.

Topic hardware gpu tensorflow

Category Data Science


I cannot answer regarding the gap. What I can help you is how to reduce the memory usage.

One thing you can try is you can convert your weights to fp16 and do inference with fp16. Don't worry you would not lose significant numerical precision especially since you are using that for deployment. In fact, this is how people do forward pass on mixed precision training. Theoretically this could increase speed and reduce memory usage. Hope it helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.