Hinge loss question

Hinge loss is usually defined as $$L(y,\hat{y}) = max(0,1-y\hat{y}) $$

What I don't understand is why are we comparing zero with $1-y\hat{y}$ instead of some other constant. Why not make it $2-y\hat{y}$, or $\sqrt2-y\hat{y}$ or just take $y\hat{y}$, to check if the observation would be on the right side of the hyperplane? Is there any reason behind '1' as a constant?

Thanks

Topic hinge-loss

Category Data Science


There's no particular reason. It needs a constant different from zero, and 1 fits nicely due to the fact that anything multiplied by 1 is the same thing. You'd get the same result if you replace it with a different number everywhere and adjust the regularization.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.