OOM memory after kernel restart, was working before

Ran my CNN on a SageMaker notebook and it started training, but I had to restart the kernel due to AWS disconnecting. However when I tried to rerun my code, I received an OOM error, and it never started training again. I tried:

  • Restarting the kernel
  • Restarted the AWS machine But the error still persisted. I find this strange due to the fact it ran before.

ResourceExhaustedError: OOM when allocating tensor with shape[262145,25600] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

Topic sagemaker gpu keras aws

Category Data Science


You can decrease the batch size of the data in model.fit.for example if you have set batch size 32 you can change it to 16 ot 8

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.