How to solve the gradient descent on a linear classification problem?

I have a problem which i have attached as an image.

Problem is in image attached

what I understand

error function is given by: $e(y, \hat y)=0$ if $y \cdot a(x-b) \ge 1$ or $e(y, \hat y) = 1-y\cdot a\cdot (x-b)$ if $y a(x-b) 1$.

Gradient descent at current $t$ is (1,3).

Gradient Descent($E_{in}(a,b)$) as per definition should be equal to partial derivative of the equation of $E_{in}(a,b)$ wrt $a$ and $b$. ($w$ is equivalent to $[a, b]$, according to me)

My doubt

Please note that when i say sum over N points, i mean to use that greek symbol used for summing of N points

  1. I am not sure how would I calculate the partial derivative in this case. When we say to fix '$a$' and vary '$b$', does it mean to find differentiation only wrt '$b$'? Which would mean that gradient($E_{in}$)= $-1 / N (\sum_{i=1}^N y_i a)$. But this removes dependence on $x$ and that's why I doubt my approach.

  2. The final equation for whom derivative needs to be done is: $1/N \sum_{i=1}^N 1-ya(x-b)$ for N points misclassified. But as per the dataset, no point is misclassified as the error function for each point with the (a,b)=(1,3) is equal to 0.

    For point 1, where x=1.2 and y = -1, as $y \cdot a \cdot (x-b) = (-1)\cdot (1)\cdot (1.2-3)=+1.8$. This means that $e(y_1, h(x_1)) = 0$.

So what should be the answer to this question?

I hope my doubt in the question is cleare to audience. But in any case please let me know if there is something that I am unable to explain.

Topic homework linear-regression gradient-descent machine-learning

Category Data Science


It appears to me that your gradient calculation is off:

$$ \frac{\partial E_{in}(a,b)}{\partial{b}} = \frac{1}{|D|} \sum_i \frac{\partial e(y_i,h_{a,b}(x_i))}{\partial b} $$

Where (with some lenient notation) :

$$ \frac{\partial e(y_i,h_{a,b}(x_i))}{\partial b} = \frac{\partial e(y,s)}{\partial{s}} * \frac{\partial h_{a,b}(x_i)}{\partial{b}} $$

We can show that :

$$ \frac{\partial e(y,s)}{\partial{s}} = -y $$ if $$1-y.s >0$$ and, otherwise: $$ \frac{\partial e(y,s)}{\partial{s}} = 0 $$

This give you an expected dependency on x, as the value change the criterion.

And :

$$ \frac{\partial h_{a,b}(x_i)}{\partial{b}} = -a $$

So you have :

$$ \frac{\partial E_{in}(a,b)}{\partial{b}} = \frac{1}{|D|} \sum_i \mathbb{1}_{y.x<1}*y*a $$

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.