How to choose a kernel function and a feature mapping function?

Although, after extensive of reading, I know the concepts of support vector machines pretty well by now, I have trouble translating the concept of the kernel function $K$ and the feature mapping function $\phi$ to a simple example such as the following.

My example data $x \in \mathbb{R}^2$: $(1,0), (4,0)$ are from one class, $(2,0), (3,0)$ are from another.

So here are my two questions:

  1. Would $\phi((x_1,x_2))=(x_1,x_2,(x_1-2.5)^2)$ be a wise choice for the mapping function $\phi:\mathbb{R}^2 \to \mathbb{R}^3$ ? If not, what $\phi$ would be a wiser choice?

  2. What would be the corresponding choice for the kernel function $K$?

Topic kernel classification svm

Category Data Science


To answer your questions:

  1. Yes, $\phi((x_1,x_2))=(x_1,x_2,(x_1-2.5)^2)$ is a good choice. You can also remove the second dimension $x_2$, as it's zero for all of your training examples. By doing so, you will have a mapping function $\phi:\mathbb{R}^2 \to \mathbb{R}^2$ which is: $\phi((x_1,x_2))=(x_1,(x_1-2.5)^2)$. (Also note that mapping into higher dimension in not necessary; mapping to a space where data becomes linearly separable is sufficient.)
  2. Kernel function $K$ calculates the dot product of points in the new space, so we have:

$$K(x,y)=\left\langle \phi(x),\phi(y)\right\rangle=\left\langle (x_1, (x_1-2.5)^2),(y_1,(y_1-2.5)^2)\right\rangle=x_1y_1+(x_1-2.5)^2(y_1-2.5)^2 \to K(x,y)=x_1y_1+((x_1-2.5)(y_1-2.5))^2$$

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.