Is it efficient to use kernel trick in primal form of SVM?
I know we can use Kernel trick in the primal form of SVM. So the hypothesis will be -
and optimization objective -
We can optimize the above equation using gradient descent, but in this equation suppose we use RBF kernel (which projects training data into infinite dimensions), then if the number of features are infinite, then dimension of 'w' will also be infinite and the optimization equation will learn 'w' using gradient descent, then how its supposed to learn if the dimension of 'w' is infinite?
Category Data Science