Linear SVM in matlab and python giving different results

I have a particular dataset on which I am getting different results when using a linear SVM in matlab and sklearn toolbox.

The data has been normalized in matlab and imported into python from a mat file.

The codes used in Matlab is

acc = 0;
for i = 1:10
   [train,test] = crossvalind('HoldOut',Y,0.2);
   mdl = fitcsvm(X(train,:),Y(train),'KernelFunction','linear');%,'BoxConstraint', 10,'KernelScale',0.001);
   predictions = predict(mdl,X(test,:));
   C = confusionmat(Y(test),predictions);
   acc(i) = (C(1,1)+C(2,2))/((C(1,1)+C(1,2)+C(2,1)+C(2,2)));
end
acc = sum(acc)/10;

The code used in python is

clf_opt = svm.SVC(C=10,gamma=0.001,kernel='linear',random_state=0, tol=1e-5)
clf_opt.fit(X,y)
cvs_svm = cross_val_score(clf_opt,X,y,cv=StratifiedKFold(10)).mean()

For matlab SVM I am getting an accuracy of around 77% and in python around 60%. The choice of parameters of C=10 and gamma = 0.001 was reached after doing a GridSearchCV in python.

I went through existing posts in google for reasons of difference in LinearSVM in matlab and python but none of them worked out. I also tried out X = StandardScaler().fit_transform(X) in python but changed accuracy by 0.5 %.

I am getting comparable classifier accuracies on standard datasets (eg.IRIS) but the results are differing in this dataset only. The dataset is attached in the link below

https://ufile.io/qs7jy

The link is a compressed file in 'rar' format and contains three files

Python_Dataset_X - Can be loaded with pickle

Python_Dataset_Y - Saved as np array

Matlab_Dataset.mat - Contains the X matrix as table and Y array.

Any assistance would be appreciated.

Topic scikit-learn classification svm libsvm

Category Data Science


In Matlab, you are separating a train-test (HoldOut Validation) type of data separation. You get the accuracy of the test set.

In Python, you are making a 10-fold Cross Validation where you get the resulting accuracy of the 10-fold, not using any seperate test set.

This two methodologies are definitely not the same, you need to have the same train-test structures to compare the two fairly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.