How to strategize model training with new data coming in every day?

I have a mysql database in which new records are added every day to raw data. This raw data is cleaned and a ML model is trained with it once a week. What should be the best strategy to capture new data in model without fetching entire records( old new) and retraining from scratch. Im saving the models every week with pickle , can I just fit the previously saved model on new records. Is this an efficient methodology ?

Topic sql pandas predictive-modeling databases machine-learning

Category Data Science


If your model type allows for incremental learning as neural nets do, you can train new examples from previously fitted models to save on fit time. This is known as online learning.

If your model does not support this, you could implement a “decayed conveyor belt” strategy on your data where you sample and train with the last n datapoints with more probability than the old ones. In this case, I would make sure a chunk of the train data is the same between any two or all model versions as a way to influence a constant learned behavior.

Beware that training like this may make the model sensitive to seasonality patterns and biased towards process drifts even if your data is not a time series, so you may need to account for this when sampling, normalizing or evaluating the models.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.