What's the relationship between an SVM and hinge loss?

Question

What's the relationship between an SVM and hinge loss?

Simon

2016年2月17日 11:12

My colleague and I are trying to wrap our heads around the difference between logistic regression and an SVM. Clearly they are optimizing different objective functions. Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss? Or is it more complex than that? How do the support vectors come into play? What about the slack variables? Why can't you have deep SVM's the way you can't you have a deep neural network with sigmoid activation functions?

Topic hinge-loss logistic-regression svm

Category Data Science

Sean Owen · Accepted Answer · 2016年2月17日 11:12

They are both discriminative models, yes. The logistic regression loss function is conceptually a function of all points. Correctly classified points add very little to the loss function, adding more if they are close to the boundary. The points near the boundary are therefore more important to the loss and therefore deciding how good the boundary is.

SVM uses a hinge loss, which conceptually puts the emphasis on the boundary points. Anything farther than the closest points contributes nothing to the loss because of the "hinge" (the max) in the function. Those closest points are the support vectors, simply. Therefore it actually reduces to picking a boundary that creates the largest margin -- distance to closest point. The theory is that the boundary case is all that really matters to generalization.

The downside is that hinge loss is not differentiable, but that just means it takes more math to discover how to optimize it via Lagrange multipliers. It doesn't really handle the case where data isn't linearly separable. Slack variables are a trick that lets this possibility be incorporated cleanly into the optimization problem.

You can use hinge loss with "deep learning", e.g. http://arxiv.org/pdf/1306.0239.pdf

What's the relationship between an SVM and hinge loss?

About