OOM memory after kernel restart, was working before
Ran my CNN on a SageMaker notebook and it started training, but I had to restart the kernel due to AWS disconnecting. However when I tried to rerun my code, I received an OOM error, and it never started training again. I tried:
- Restarting the kernel
- Restarted the AWS machine But the error still persisted. I find this strange due to the fact it ran before.
ResourceExhaustedError: OOM when allocating tensor with shape[262145,25600] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]
Category Data Science