Jupyter, Python: the kernel appears to have died while training a model on a big amount of data

I am training my model on almost 200 000 images, i'm using Jupyter and now after 3 days of training ( i used 800 epochs and batch-size = 600) I have this the kernel appears to have died. It will restart automaticaly And this appears after 143 epochs only. Can anyone help me to solve this, and also can anyone advise me something in case of using big amount of data, because i am struggling with this dataset and I can't retrain the model each time the Jupyter blocks. Infact, I'm working on my internship project so I have to use all the data. I will be so grateful for your help.

Topic image cnn jupyter classification python

Category Data Science


I have had the same problem while training in huge data sets in Jupyter Notebooks. The only solution I found was to create a scrip .py with my training process (including model persistence) and running it from my terminal (python3 myscript.py)


Depending on the library you use you should be able to create a checkpoint of your model every few iterations so that you dont lose your models in the event of a crash. If you are unlucky enough to encounter a crash, you can always begin retraining from the latest available checkpoint. That way you don't start from scratch. Good Luck on your internship.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.