How to solve the gradient descent on a linear classification problem?
I have a problem which i have attached as an image.
what I understand
error function is given by: $e(y, \hat y)=0$ if $y \cdot a(x-b) \ge 1$ or $e(y, \hat y) = 1-y\cdot a\cdot (x-b)$ if $y a(x-b) 1$.
Gradient descent at current $t$ is (1,3).
Gradient Descent($E_{in}(a,b)$) as per definition should be equal to partial derivative of the equation of $E_{in}(a,b)$ wrt $a$ and $b$. ($w$ is equivalent to $[a, b]$, according to me)
My doubt
Please note that when i say sum over N points, i mean to use that greek symbol used for summing of N points
I am not sure how would I calculate the partial derivative in this case. When we say to fix '$a$' and vary '$b$', does it mean to find differentiation only wrt '$b$'? Which would mean that gradient($E_{in}$)= $-1 / N (\sum_{i=1}^N y_i a)$. But this removes dependence on $x$ and that's why I doubt my approach.
The final equation for whom derivative needs to be done is: $1/N \sum_{i=1}^N 1-ya(x-b)$ for N points misclassified. But as per the dataset, no point is misclassified as the error function for each point with the (a,b)=(1,3) is equal to 0.
For point 1, where x=1.2 and y = -1, as $y \cdot a \cdot (x-b) = (-1)\cdot (1)\cdot (1.2-3)=+1.8$. This means that $e(y_1, h(x_1)) = 0$.
So what should be the answer to this question?
I hope my doubt in the question is cleare to audience. But in any case please let me know if there is something that I am unable to explain.
Topic homework linear-regression gradient-descent machine-learning
Category Data Science