OOM memory after kernel restart, was working before

Question

OOM memory after kernel restart, was working before

Finn Williams

2022年1月28日 16:04

Ran my CNN on a SageMaker notebook and it started training, but I had to restart the kernel due to AWS disconnecting. However when I tried to rerun my code, I received an OOM error, and it never started training again. I tried:

Restarting the kernel
Restarted the AWS machine But the error still persisted. I find this strange due to the fact it ran before.

ResourceExhaustedError: OOM when allocating tensor with shape[262145,25600] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

Topic sagemaker gpu keras aws

Category Data Science

Γιάννης Αγγελής · Accepted Answer · 2021年8月20日 10:10

1

Γιάννης Αγγελής answered at 2021年8月20日 10:10

You can decrease the batch size of the data in model.fit.for example if you have set batch size 32 you can change it to 16 ot 8

OOM memory after kernel restart, was working before

About