Understanding the math behind linear classification

For example we have $X$ train data, $y$ and $w$

Our margin is $M = y_i \langle w, x_i \rangle$

If $M_i 0$ classifier return True predict and otherwise, if $M_i 0$ we get False predict.

How does it work? $y_i = \langle w, x_i \rangle$ , it means they have same sign, and if we multiply them, the multiply product will be always positive, because plus * plus = plus and minus * minus = plus. Otherwise it will be False

Let's have a function $L(M) = [M 0]$

And what author of the course is suggesting is to create a function upper bound and then minimize it, in the way that we can't minimize plane $L(M)$

And here comes sigmoid or any other function

And now author says that if we are able to minimize upper bound function, then we will minimize $L(M)$ and it sounds good to me, but still I have no idea how it will minimize the initial $L(M)$.

Because it looks like the upper bound function is symmetrical and if we are going to change the argument of the function, the area under function will remain the same.

Help me to understand this please, because it is really good explanation of how does linear classifier gets done and I'm stuck with this.

Topic linear-models mathematics machine-learning-model classification machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.