Pre-process data images before training OneClassSVM and decrease number of features
I want to train a OneClassSVM() using sklearn, and I have a set of around 800 images in my training set.
I am using opencv to read the images and resize them to constant dimensions (960x540) and then adding them to a numpy-array. The images are RGB and thus have 3-dimensions. For that, I am reshaping the numpy array after reading all the images:
#Assume X is my numpy array which contains all the images before reshaping
#Now I reshape X
n_samples = len(X)
X = X.reshape(n_samples, 950*540*3)
As you can see, the number of features is huge (1,539,000 to be exact).
Now I try to train my model:
model = OneClassSVM(kernel='rbf', gamma=0.001)
model.fit(X)
After running my code, it crashed due to MemoryError
. If I'm not mistaken this is obvious due the large number of features? So, is there a better way to pre-process the images before fitting them, or to decrease the number of features?
Topic numpy preprocessing scikit-learn python machine-learning
Category Data Science