does anyone have any advice on how to implement this loss in order to use it with a convolutional neural network? Also, how should I encode the labels of my training data? We were using one hot encoding with bce loss before and I was wandering if I should keep it that way also for the hinge loss, since the label itself is not used in the formula of the loss other than for indicating which one is the true …
def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c …
Hinge loss is usually defined as $$L(y,\hat{y}) = max(0,1-y\hat{y}) $$ What I don't understand is why are we comparing zero with $1-y\hat{y}$ instead of some other constant. Why not make it $2-y\hat{y}$, or $\sqrt2-y\hat{y}$ or just take $y\hat{y}$, to check if the observation would be on the right side of the hyperplane? Is there any reason behind '1' as a constant? Thanks
I read on this Wikipedia page the following about soft-margin SVM: "The parameter $λ$ determines the trade-off between increasing the margin size and ensuring that the $x_i$ lie on the correct side of the margin. Thus, for sufficiently small values of $λ$, the second term in the loss function will become negligible, hence, it will behave similar to the hard-margin SVM, if the input data are linearly classifiable, but will still learn if a classification rule is viable or not." …
I hope this doesn't come off as a silly question, but I am looking at SVMs and in principle I understand how they work. The idea is to maximize the margin between different classes of point (within any dimension) as much as possible. So to understand the internal workings of the SVM classification algorithm, I decided to study the cost function, or the Hinge Loss, first and get an understanding of it... $$L=\frac{1}{N} \sum_{i} \sum_{j \neq y_{i}}\left[\max \left(0, f\left(x_{i} ; …
I'm learning SVM and many classic tutorials talk about the formulation of SVM problem as a convex optimization problem: i.e. We have the objective function with slack variables and subject to constraints. Most tutorials go through the derivation from this primal problem formulation to the classic formulation (using Lagrange multipliers, get the dual form, etc...). As I followed the steps, they make sense eventually after some time of learning. But then an important concept for SVM is the hinge loss. …
For knwoledge graph completion, it is very common to use margin-based ranking loss In the paper:margin-based ranking loss is defined as $$ \min \sum_{(h,l,t)\in S} \sum_{(h',l,t')\in S'}[\gamma + d(h,l,t) - d(h',l,t')]_+$$ Here $d(\cdot)$ is the predictive model, $(h,l,t)$ means a positive training instance, and $(h',l,t')$ means a negative training instance corresponding to $(h,l,t)$. However, in the Andrew's paper, it defines $$ \min \sum_{(h,l,t)\in S} \sum_{(h',l,t')\in S'}[\gamma + d(h',l,t') - d(h,l,t)]_+$$ It seems that they switch the terms $d(h',l,t')$ and $d(h,l,t)$. …
My colleague and I are trying to wrap our heads around the difference between logistic regression and an SVM. Clearly they are optimizing different objective functions. Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss? Or is it more complex than that? How do the support vectors come into play? What about the slack variables? Why can't you have deep SVM's the way you can't you have a deep neural network with …