Incremental Learning with sklearn: warm_start, partial_fit(), fit()
I have built an ML model with the goal of making predictions for targets of the following week. In general, new data will come in and be processed at the end of each week and be in the same data structure as before. In other words, the same number of features, same classes for classification, etc.
Instead of re-training the model from scratch for each week's predictions, I am considering applying an incremental learning approach so that past learning is not entirely discarded and the model would (presumably) increase in performance over time. I'm working with sklearn
on Python 3. There were only a handful of posts on StackOverflow regarding this, but many of the answers seem inconsistent (possibly due to updates with sklearn's API?).
The documentation here and here suggests that incremental/online learning is possible with certain ML implementations - implying that the new datasets could be thought of as mini-batches and incrementally trained by saving/loading the model and calling .partial_fit()
with the same model parameters.
Although all algorithms cannot learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the partial_fit API are candidates. 1
Unlike fit, repeatedly calling partial_fit does not clear the model, but updates it with respect to the data provided. The portion of data provided to partial_fit may be called a mini-batch. Each mini-batch must be of consistent shape, etc. In iterative estimators, partial_fit often only performs a single iteration. 2
However, the documentation here is throwing me off.
partial_fit also retains the model between calls, but differs: with warm_start the parameters change and the data is (more-or-less) constant across calls to fit; with partial_fit, the mini-batch of data changes and model parameters stay fixed. 3
There are cases where you want to use warm_start to fit on different, but closely related data. 3
For the problem I am tackling, ideally model parameters should be adjusted based on cross-validation and new datasets should be weighted more heavily than old ones due to concept drift. However ignoring this for now,
- In (3), what exactly does (more-or-less) constant...different but closely related data mean? Since the data structure of new datasets are the same, should i be calling
estimator(warm_start=True).fit(#new df)
orestimator.partial_fit(#new df)
? - For iterative estimators such as
sklearn.linear_model.SGDClassifier
, only one epoch is run when using.partial_fit()
. If I want $k$ epochs, would calling it on the same dataset repeatedly be the same as calling.fit()
with $k$ epochs to begin with? - Do dedicated libraries such as
creme
offer any advantage for incremental learniing?
Topic online-learning scikit-learn python machine-learning
Category Data Science