Understanding SVM Kernels

Following Andrew Ng's machine learning course, he explains SVM kernels by manually selecting 3 landmarks and defining 3 gaussian function based on them. Then he says that we are actually defining 3 new features which are $f_1$, $f_2$ and $f_3$.

$\hskip0.9in$

And by applying these 3 gaussian functions on every input data: $$x=(x_1,x_2)\to \hat{x}=(f_1(x), f_2(x), f_3(x))$$

it seems that we are mapping our data from $\mathbb R^2$ space to a $\mathbb R^3$ space. Now our goal is to find a hyperplane in the 3 dimensional space, where our transformed data is linearly separable. Is my understanding correct? If not, how these 3 new features should be interpreted?

$\hskip1in$

In some blog posts, i have read that by using a gaussian kernel, we are mapping our data to an infinite dimensional space (where gaussian kernel computes the dot product of transformed input data) which contradicts with my above understandings.

Topic kernel self-study svm machine-learning

Category Data Science


Yep, this is the correct interpretation. The kernels make a difficult classification problem into a much simpler one by making the data linearly separable by transforming it into a higher dimension. I think this image does a good job of illustrating that. enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.