Why GPU doesn't utilise System memory?

I have noticed that more often when training huge Deep Learning models on consumer GPUs (like GTX 1050ti) The network often doesn't work.

The reason is that the GPU just doesn't have enough memory to train the said network. This problem has solutions like using the CPU's cache memory for storage of things that are not being actively being used by GPU.

So my question is - Is there any way to train models on CUDA with memory drawn from System RAM? I understand that the tradeoff would involve speed and accuracy but it JUST might work to train a BERT model on my 1050ti

Topic cuda hardware gpu neural-network machine-learning

Category Data Science


When you are training one batch, the GPU memory is full, i.e. it is generally needed for calculation. Time of one batch processing is like 0.1 second or less. What is your CPU-GPU bandwidth, 2 GB/s? So without loosing speed you could send at maximum 0.1 GB back and forward... Or in terms of "I want anyway", you can extend your memory by 1 GB for the price of 10x slowdown.

There are techniques of throwing out some intermediate layers' activations and recalculating them during back-propagation. It's much better idea to sacrify say half of (enormous) GPU calculation power to save some memory. I saw reports that people actually use them. But don't know how and didn't see any supports in ML frameworks

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.