Confidence intervals for evaluation on test set

I'm wondering what the best practise approach is for finding confidence intervals when evaluation the performance of a classifier on the test set.

As far as I can see, there are two different ways of evaluating the accuracy of a metric like, say, accuracy:

  1. Evaluate the accuracy using the formula interval = z * sqrt( (error * (1 - error)) / n), where n is sample size, error is classification error (i.e. 1-accuracy) and z is a number representing multiples of gaussian standard deviations.

  2. Train split the test set into k folds and train k classifiers, leaving a different fold out for each. Then evaluating all of these on the test set and calculating mean and variance.

Intuitively, I feel like the latter would give me an estimate on how sensitive the performance is to changes in the data whereas the former would give allow me to compare two different models directly.

I have to say I'm a bit confused...

Topic uncertainty confidence classification statistics machine-learning

Category Data Science


You need to distinguish between uncertainty on the prediction and uncertainty on the class.

One example, lets say that we are tossing a coin. I am 100% confident that the probability of getting "tails" is 50%

On the other hand, there is a 90% probability that tomorrow will rain but the weatherman is not very certain of this to happen.

To get this definition I will recommend to read this paper: https://arxiv.org/abs/1910.09457

In recent years the tendency has been to use ensemble methods and extract some basic statistics to calculate a given interval

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.