How to retrain sklearn pipeline with new data?
I have trained and saved a data processing pipeline and an LGBM regressor on 3 months of historical data. Now I know that I can retrain the LGBM regressor on new data every day by passing my trained model as init_model
for .train
function.
How do I retrain my sklearn pipeline that does the data processing using this new data?
One way I can think of is to monitor the feature drift and retrain pipeline for latest 3 months data when it crosses a certain threshold. Is there any better way to do this that I might be missing?
Topic pipelines scikit-learn python machine-learning
Category Data Science