Automate detection of overfitting models based on autoML libraries
I'm trying to use machine learning to impute missing data in series using some auto-ML libraries in python (so far : dabl, FLAML, auto-sklearn and AutoKeras).
I know the way to detect overfitting in a classical way: that would be to plot test/train metrics while tuning your model. However in my case, I think of two reasons why I can't use that:
- first, I'm hopping to tune multiple models (due to physical considerations) ; that would amount to - at least - one thousand, way too many to review those manually;
- secondly, the automachine-learning libraries return ready to use models (with or without logs of some sort) ; all the metrics evaluation are done under the hood.
I then chose to split my datasets and compare scores (r2 + rmse) on train/test samples and reject the models if :
- the mean rmse (test+train) is superior to the standard deviation of the serie;
- r2_train .7;
- r2_test / r2_train .8.
Note : The first test is the only one I could think of in the context of automation (that is to match the rmse against one measure expressed in the same units as the pollutants). Maybe there are better course of actions and in that case I'm open to suggestions.
That beeing said, those guidelines didn't prevent my algorithms to present those kind of results (I apologize for legends and axis-titles in french...: on the Y-axis are the prediction, on the X-axis the true values; entraînement in the legend corresponds to the training set):
As you can see, the model is not a good one even if the scores are decent. I'm not even sure this qualifies as an usual overfitting problem, but the model has clearly been trained to memorize two values (around 5.7 and 7.4pH units). (I could completely understand why those results occurred after peeking at the original dataset; in fact, more than one autoML library resulted to these anomalies... But that's beside the point).
Is there a way to discard such models in an automated process? I could easily increase the threshold used against r2 and rmse to discard this model ; but I'm not convinced it would allow every such model to be discarded...
My instinct tells me something involving clusterisation of true/predict values and standard deviation of each components (inside each cluster) would work ; but I also suspect that will turn to be fastidious (and lengthy) task...
Topic overfitting automation regression
Category Data Science