SVM behavior when regularization parameter equals 0

Question

SVM behavior when regularization parameter equals 0

Ben

2019年7月24日 06:00

I read on this Wikipedia page the following about soft-margin SVM:

"The parameter $λ$ determines the trade-off between increasing the margin size and ensuring that the $x_i$ lie on the correct side of the margin. Thus, for sufficiently small values of $λ$, the second term in the loss function will become negligible, hence, it will behave similar to the hard-margin SVM, if the input data are linearly classifiable, but will still learn if a classification rule is viable or not."

I can't understand why in the case that λ=0 the algorithm will behave like hard-margin SVM. If λ=0, it seems to me that the algorithm won't have any reason to perform any optimization on the margin. Doesn't it just become a perceptron in that case, since the algorithm only "cares" about classifying all the train data correctly, while not reaching any optimal solution regarding the margin?

I'll appreciate a clarification about the issue, please.

Topic hinge-loss regularization svm machine-learning

Category Data Science

Ben Reiniger · Accepted Answer · 2019年6月23日 01:49

First, the remark from the wikipedia article deals with small (positive) values of $\lambda$, not $\lambda=0$. Indeed, if $\lambda=0$, then every separating hyperplane achieves the minimum score of 0.

If the data is linearly separable, then taking $\lambda$ small enough that the first term dominates ensures that minimizing the loss will require taking a separating hyperplane to zero out the first term, and subject to that minimizing the second term is equivalent to the original hard SVM.

Green Falcon · Accepted Answer · 2019年6月22日 10:51

The reason is that your data is in a way that the algorithm does not make any mistake on it in the current feature space. it is an easy problem that the algorithm does not need to ignore wrong points due to the easy data which is provided. If it occurs that your data is hard to be classified, the margin then will try to ignore some data points which may lead to narrow margins.

It is worth mentioning that even though it performs the same, it won't be as a simple perceptron, at least in most cases. Consider that SVM somehow considers the geometrical position of data points while a simple perceptron always tries to reduce the cost function. You can take a look at the pictures which are provided here.

SVM behavior when regularization parameter equals 0

About