SVM behavior when regularization parameter equals 0

I read on this Wikipedia page the following about soft-margin SVM:

"The parameter $λ$ determines the trade-off between increasing the margin size and ensuring that the $x_i$ lie on the correct side of the margin. Thus, for sufficiently small values of $λ$, the second term in the loss function will become negligible, hence, it will behave similar to the hard-margin SVM, if the input data are linearly classifiable, but will still learn if a classification rule is viable or not."

I can't understand why in the case that λ=0 the algorithm will behave like hard-margin SVM. If λ=0, it seems to me that the algorithm won't have any reason to perform any optimization on the margin. Doesn't it just become a perceptron in that case, since the algorithm only "cares" about classifying all the train data correctly, while not reaching any optimal solution regarding the margin?

I'll appreciate a clarification about the issue, please.

Topic hinge-loss regularization svm machine-learning

Category Data Science


First, the remark from the wikipedia article deals with small (positive) values of $\lambda$, not $\lambda=0$. Indeed, if $\lambda=0$, then every separating hyperplane achieves the minimum score of 0.

If the data is linearly separable, then taking $\lambda$ small enough that the first term dominates ensures that minimizing the loss will require taking a separating hyperplane to zero out the first term, and subject to that minimizing the second term is equivalent to the original hard SVM.


The reason is that your data is in a way that the algorithm does not make any mistake on it in the current feature space. it is an easy problem that the algorithm does not need to ignore wrong points due to the easy data which is provided. If it occurs that your data is hard to be classified, the margin then will try to ignore some data points which may lead to narrow margins.

It is worth mentioning that even though it performs the same, it won't be as a simple perceptron, at least in most cases. Consider that SVM somehow considers the geometrical position of data points while a simple perceptron always tries to reduce the cost function. You can take a look at the pictures which are provided here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.