Is standardization needed before using scikit-learn SVM?

I am using the SVM function provided by scikit-learn. I would like to know whether I need to perform standardization before fitting the model. As I know, LibSVM tends to require pre-processing the data. I am not sure whether scikit-learn automatically normalizes the data instead of expecting us to handle it ourselves.

Topic scikit-learn svm libsvm machine-learning

Category Data Science


By default, for any methods that use gradient descent or feature combination (e.g. PCA), I scale my data if the orders of magnitude in the features are different from each other. The optimization is easier when the fitted parameters are not too far from zero in any direction. Scaling you data doesn't matter for tree-based methods


scikit learn does not standardize data, but it does offer utilities for you to standardize your input data yourself: http://scikit-learn.org/stable/modules/preprocessing.html

the rule of thumb is to standardize if your data aren't related. That is, if channel X is not a function of channel Y, you should standardize

Qualitatively, think about it this way, SVM 'creates a hyperplane' to separate data into categories; if the data are skewed too far in one axis, that will make it harder to draw a plane to separate them

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.