Convert Pandas Dataframe with mixed datatypes to LibSVM format

I have a pandas data frame with about Million rows and 3 columns. The columns are of 3 different datatypes. NumberOfFollowers is of a numerical datatype, UserName is of a categorical data type, Embeddings is of categorical-set type. df: Index NumberOfFollowers UserName Embeddings Target Variable 0 15 name1 [0.5 0.3 0.2] 0 1 4 name2 [0.4 0.2 0.4] 1 2 8 name3 [0.5 0.5 0.0] 0 3 10 name1 [0.1 0.0 0.9] 0 ... ... .... ... .. I would …
Category: Data Science

How to recover the decision boundary from LIBSVM?

I am trying to recover the decision boundary from the model resulting from "svmtrain" of LIBSVM in Octave. The output of the model is shown in the following, I highlighted the parameters I think correspond to the decision boundary equation: This is the decision boundary equation: How do I recover the decision boundary "u" using the equation and the model parameters above? I'd like to do this without calling "svmpredict". Thanks.
Topic: octave libsvm
Category: Data Science

Implementing a weighted support vector machine in python

I have the following problem. The minimization problem of the SVM that I want to solve is: $$ \min_{w, b} \frac{1}{2}w^{T}w + \sum^{m}_{i=1}C_{i}xi_{i} $$ Subject to: $$ y_{i}(w^{T}x_{i} - b) \geq 1 - \xi_{i} $$ $$ \xi_{i} \geq 0 $$ $$ C_{i} = \nu_{i}C $$ where $\nu_{i}$ is some function. Now the minimization problem that the base SVM solves is: $$ \min_{w, b} \frac{1}{2}w^{T}w + C\sum^{m}_{i=1}xi_{i} $$ Subject to: $$ y_{i}(w^{T}x_{i} - b) \geq 1 - \xi_{i} $$ $$ \xi_{i} …
Topic: svm python libsvm
Category: Data Science

SVM SVC: Metric for parameter optimization on imbalanced data

I trained a multiclass SVC with RBF kernel on a down-sampled (and therefore balanced) dataset. Now I want to perform grid search to find best cost and gamma. What performance metric should I optimize for? I have a highly imbalanced test set. There might be a factor of over 100 between the number of instances of different classes. I am classifying 3D points (car, facade, human) - so I think one could assign equal weight to all classes.
Category: Data Science

Prepare data for SVM, Is it valid to normalise the data before and after PCA dimension reduction

Is it valid to normalise a dataset, reduce dimensionality with PCA and then to normalise the reduced dimension data. Assuming this is performed on training data, should the same PCA coefficients be used to reduce the dimension of the test data. Should the same max and min normalisation values be used for the test and training data. I have included a simplified example of the code I am using which may describe I said better. Thanks in advance. %% Prepare …
Category: Data Science

What is the difference between Linear SVM and SVM with linear kernel?

I'm wondering whether there is a difference between Linear SVM and SVM with a linear kernel. Or is a linear SVM just a SVM with a linear kernel? If so, what is the difference between the two variables linear_svm and linear_kernel in the following code. from sklearn import svm linear_svm = svm.LinearSVC(C=1).fit(X_train, y_train) linear_kernel_svm=svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
Category: Data Science

Practical examples/tutorials of using One-Class Support Vector Machines

I am a newbie in machine learning, and hope to solve an anomaly detection task using One-Class Support Vector Machines (OCSVM). Despite the availability of several general introductions, definitions and academic papers on OCSVM, I do not find tutorials with practical examples except a few provided by scikit-learn. I'd appreciate some pointers to such resources especially with code examples/datasets as this will aid better understanding, than reading academic papers.
Category: Data Science

Linear SVM in matlab and python giving different results

I have a particular dataset on which I am getting different results when using a linear SVM in matlab and sklearn toolbox. The data has been normalized in matlab and imported into python from a mat file. The codes used in Matlab is acc = 0; for i = 1:10 [train,test] = crossvalind('HoldOut',Y,0.2); mdl = fitcsvm(X(train,:),Y(train),'KernelFunction','linear');%,'BoxConstraint', 10,'KernelScale',0.001); predictions = predict(mdl,X(test,:)); C = confusionmat(Y(test),predictions); acc(i) = (C(1,1)+C(2,2))/((C(1,1)+C(1,2)+C(2,1)+C(2,2))); end acc = sum(acc)/10; The code used in python is clf_opt = svm.SVC(C=10,gamma=0.001,kernel='linear',random_state=0, tol=1e-5) …
Category: Data Science

Why Liblinear performs drastically better than libsvm linear kernel?

l have a dataset of dim=(200,2000) 200 examples and 2000 features. l have 10 classes. l used sklearn for both cases : svm.svc(kernel=linear) LinearSVC() However LinearSVC() performs drastically better than svm with linear kernel. 60% against 23%. l'm supposed to get the same or comparable results since they are fed with same parameters and data. What's wrong ? Thank you
Category: Data Science

Obtain standard deviation for libsvm

I have the following code for Grid search, but it only return the accuracy result using 5 folds cross-validation. Is it possible to obtain standard deviation from the 5 folds CV. How would you do that? Thanks in advance. for i=1:numLog2c log2c = log2c_list(i); for j=1:numLog2g log2g = log2g_list(j); cmd = ['-q -v ', int2str(nFold), ' -c ', num2str(2^log2c), ' -g ', num2str(2^log2g),' ', svmCmd]; cv = svmtrain(trainLabel, trainData, cmd); cvMatrix(i,j) = cv; if(cv >= bestcv) bestcv= cv; bestLog2c = …
Category: Data Science

$\chi^{2}$ kernel SVM performance issue

I am using $\chi^{2}$ kernel for non-linear SVM (using libSVM) for classifying MNIST digits. I am getting very bad performance (worse than random guessing). The $\chi^{2}$ kernel code (in MATLAB) is as follows: function D = chi2_kernel(X,Y) %Computes the chi2 kernel for two Matrices X and Y D = zeros(size(X,1),size(Y,1)); for i = 1:size(Y,1) d = bsxfun(@minus,X, Y(i,:)); s = bsxfun(@plus,X,Y(i,:)); D(:,i) = sum(d.^2 ./ (s/2),2); end D = exp(-0.0001.*(1-D)); end Using this I computed the kernel matrix as follows …
Topic: matlab svm libsvm
Category: Data Science

Use liblinear on big data for semantic analysis

I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem. Last year, Liblinear was release, and it can solve performance bottleneck. But it cost too much memory. Is MapReduce the only way to solve semantic analysis problem on big data? Or are there any other methods that can improve memory bottleneck on Liblinear?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.