Bias and variance in the model o in the predictions?

Question

Bias and variance in the model o in the predictions?

SRG

2022年2月8日 04:06

This topic confuses me. In the literature or articles, when talking about bias and variance in automatic learning, specifically in cross-validation, do they refer to the high bias (underfitting) and high variance (overfitting) in the model? Or do they refer to the bias and variance of the predictions obtained in the iterations of the cross-validation? How to handle each case?

Topic bias variance cross-validation machine-learning

Category Data Science

Malo · Accepted Answer · 2021年9月5日 08:20

Bias and variance are used to describe the predictions of models and they define if it is a good one or not. As a perfect model (low bias, low variance) does not exists, you often have to chose if you prefer a model with high bias/low variance or a model with low bias/high variance. This applis to the distribution of the predictions.

Cross validation helps you have a more accurate value of your metrics (like accuracy), as the stochastic nature of the way models are calculated can give you different values for the metrics each time your run it: because of some randomness (random seeds) and because of the way the train/validation/test split is done...

So it is good practice to do cross validation as the metrics will be averaged over several train/validation split and so the mean values will be more realistics to compare the different cases/models/parameters.

FrancoSwiss · Accepted Answer · 2020年3月21日 23:04

I finally understood "Bias and Variance" with Logistic Regression (LR). LR is known to have a high bias but low variance. For example, LR accuracy might be only at 90% while you'd get 95% with a Decision Tree (tends to overfit). But with LR in production feeding new data, you're very likely to see 90% accuracy ... while the Decision Tree had massive accuracy swings (variance) with new data (70% - 95%).

Hope that helps.

Nuclear Hoagie · Accepted Answer · 2019年6月25日 19:59

In some cases, you may have a model that's a black box - you feed input features and get output predictions, without knowing or caring what happens in the middle. In those situations, the model is, in a way, defined by its output - two models that produce the same predictions are indistinguishable from one another, even though they may be entirely distinct models. In these situations, saying a model is biased or has high variance is equivalent to saying the predictions of the model are biased or have high variance. It's acceptable to describe both the model and its output in terms of bias/variance. Cross validation provides an unbiased estimate of bias/variance for your model/predictions.

Bias and variance in the model o in the predictions?

About