Estimating the uncertainty of regression models

Question

Estimating the uncertainty of regression models

Maria

2022年2月2日 23:07

Given a regression model, with n features, how can I measure the uncertainty or confidence of the model for each prediction? Suppose for a specific prediction the accuracy is amazing, but for another it's not. I would like to find a metric that will let me decide if, for each frame, I would like to listen to the model or not.

Topic machine-learning-model azure-ml deep-learning python machine-learning

Category Data Science

Devashish Prasad · Accepted Answer · 2022年1月2日 16:30

To answer my question, I will use three types of models -

KNN Regression
Regression Trees
Complex models like NN or SVM

KNN Regression

It is a non-parametric regression model, and the confidence might be explicitly modeled using mean absolute error or mean squared error. At the test time, for a given instance, K nearest instances will be found, and depending on their average distance with the given instance, we can compute mean absolute error or mean squared error. We can estimate the confidence using these metrics. A higher mean absolute error or mean squared error would result in a lower confidence value and vice versa.

Regression Trees

When building the tree during the training time, we make the leaf nodes by assigning the average target value of the associated instances to the node. While doing so, we can also assign and maintain the average mean absolute error or mean squared error between the average target value and the target values of the associated instances of the node. Similar to taking the standard deviation of the target values of the instances of that node. So, at the test time, for a given instance, when the tree reaches a specific leaf node, along with the regression target value, we will also get the mean absolute error or mean squared error we can expect for instances that end up at these nodes. And as explained in KNN regression, we can model confidence accordingly.

Complex models like NN or SVM

These simple techniques discussed above could be applied to these models as well. Just like KNN regression, at the test time, for a test instance, we might be able to find the nearest K instances from the training set and compute mean absolute error or mean squared error to get estimated confidence. But, there is a lot more we can do. You can read more on this thread (https://stats.stackexchange.com/questions/247551/how-to-determine-the-confidence-of-a-neural-network-prediction) where people discuss something very similar that you might find useful.

Erwan · Accepted Answer · 2022年1月2日 13:41

This is evaluation and it's done experimentally: with a test set of fresh instances containing the true target value, apply the model and measure the error across all the instances (e.g. with MAE, MSE, RMSE...).

Assuming that the test is a sufficiently large representative sample of the data, it's possible this way to estimate the quality of the model statistically. For example, we can say that an instance is in average predicted within range $x$ of the true value.

But in general it's impossible to know how good a prediction is for a specific instance: by definition, a model gives its best prediction. If the model was able to know that its prediction is bad, logically it should give a different prediction. Note that if this was possible, it would also be possible to build a near-perfect model iteratively: as long as the prediction is bad, try again.

For the record, there are some task-specific cases where one attempts to estimate the confidence of a supervised model (for example MT quality estimation). This is done by building a new supervised model in order to predict a confidence score. This new model can also make errors, of course.

Estimating the uncertainty of regression models

About