Pre-process data images before training OneClassSVM and decrease number of features

Question

Pre-process data images before training OneClassSVM and decrease number of features

riadrifai

2022年5月25日 13:07

I want to train a OneClassSVM() using sklearn, and I have a set of around 800 images in my training set.

I am using opencv to read the images and resize them to constant dimensions (960x540) and then adding them to a numpy-array. The images are RGB and thus have 3-dimensions. For that, I am reshaping the numpy array after reading all the images:

#Assume X is my numpy array which contains all the images before reshaping
#Now I reshape X
n_samples = len(X)
X = X.reshape(n_samples, 950*540*3)

As you can see, the number of features is huge (1,539,000 to be exact).

Now I try to train my model:

model = OneClassSVM(kernel='rbf', gamma=0.001)
model.fit(X)

After running my code, it crashed due to MemoryError. If I'm not mistaken this is obvious due the large number of features? So, is there a better way to pre-process the images before fitting them, or to decrease the number of features?

Topic numpy preprocessing scikit-learn python machine-learning

Category Data Science

mevoki · Accepted Answer · 2018年6月4日 13:16

One approach is to use an artificial neural network to extract features representing the images. This can be done either by using a pre-configured network with pre-trained weights and extracting the output of one of the hidden layers, or by constructing and training your own network for this purpose.

To use a pre-configured, pre-trained module can be accomplished easily with Keras and TensorFlow, where you can import InceptionV3 or MobileNet with weights pre-trained on ImageNet, which would net you 2048 or 1024 features per image, respectively.

An article discussing such an approach can be found here. This could hopefully give you better performance than using something like PCA to conduct dimensionality reduction.

Itachi · Accepted Answer · 2017年11月5日 09:04

You should try converting them to Principle Components using PCA. Please refer this Analytics Vidya PCA, this should give give you a good understanding.

PCA converts n large vector into p Principle components.

First principal component is a linear combination of original predictor variables which captures the maximum variance in the data set

Second PC is also a linear combination of original predictor variables which captures remaining variance. All the succeeding ones follow the same concept.

This way you can select top Principle components which explains good enough cumulative variance in your data

Pre-process data images before training OneClassSVM and decrease number of features

About