I have a pandas data frame with about Million rows and 3 columns. The columns are of 3 different datatypes. NumberOfFollowers is of a numerical datatype, UserName is of a categorical data type, Embeddings is of categorical-set type. df: Index NumberOfFollowers UserName Embeddings Target Variable 0 15 name1 [0.5 0.3 0.2] 0 1 4 name2 [0.4 0.2 0.4] 1 2 8 name3 [0.5 0.5 0.0] 0 3 10 name1 [0.1 0.0 0.9] 0 ... ... .... ... .. I would …
I am trying to recover the decision boundary from the model resulting from "svmtrain" of LIBSVM in Octave. The output of the model is shown in the following, I highlighted the parameters I think correspond to the decision boundary equation: This is the decision boundary equation: How do I recover the decision boundary "u" using the equation and the model parameters above? I'd like to do this without calling "svmpredict". Thanks.
I have the following problem. The minimization problem of the SVM that I want to solve is: $$ \min_{w, b} \frac{1}{2}w^{T}w + \sum^{m}_{i=1}C_{i}xi_{i} $$ Subject to: $$ y_{i}(w^{T}x_{i} - b) \geq 1 - \xi_{i} $$ $$ \xi_{i} \geq 0 $$ $$ C_{i} = \nu_{i}C $$ where $\nu_{i}$ is some function. Now the minimization problem that the base SVM solves is: $$ \min_{w, b} \frac{1}{2}w^{T}w + C\sum^{m}_{i=1}xi_{i} $$ Subject to: $$ y_{i}(w^{T}x_{i} - b) \geq 1 - \xi_{i} $$ $$ \xi_{i} …
I trained a multiclass SVC with RBF kernel on a down-sampled (and therefore balanced) dataset. Now I want to perform grid search to find best cost and gamma. What performance metric should I optimize for? I have a highly imbalanced test set. There might be a factor of over 100 between the number of instances of different classes. I am classifying 3D points (car, facade, human) - so I think one could assign equal weight to all classes.
Is it valid to normalise a dataset, reduce dimensionality with PCA and then to normalise the reduced dimension data. Assuming this is performed on training data, should the same PCA coefficients be used to reduce the dimension of the test data. Should the same max and min normalisation values be used for the test and training data. I have included a simplified example of the code I am using which may describe I said better. Thanks in advance. %% Prepare …
I'm wondering whether there is a difference between Linear SVM and SVM with a linear kernel. Or is a linear SVM just a SVM with a linear kernel? If so, what is the difference between the two variables linear_svm and linear_kernel in the following code. from sklearn import svm linear_svm = svm.LinearSVC(C=1).fit(X_train, y_train) linear_kernel_svm=svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
I am a newbie in machine learning, and hope to solve an anomaly detection task using One-Class Support Vector Machines (OCSVM). Despite the availability of several general introductions, definitions and academic papers on OCSVM, I do not find tutorials with practical examples except a few provided by scikit-learn. I'd appreciate some pointers to such resources especially with code examples/datasets as this will aid better understanding, than reading academic papers.
I have a particular dataset on which I am getting different results when using a linear SVM in matlab and sklearn toolbox. The data has been normalized in matlab and imported into python from a mat file. The codes used in Matlab is acc = 0; for i = 1:10 [train,test] = crossvalind('HoldOut',Y,0.2); mdl = fitcsvm(X(train,:),Y(train),'KernelFunction','linear');%,'BoxConstraint', 10,'KernelScale',0.001); predictions = predict(mdl,X(test,:)); C = confusionmat(Y(test),predictions); acc(i) = (C(1,1)+C(2,2))/((C(1,1)+C(1,2)+C(2,1)+C(2,2))); end acc = sum(acc)/10; The code used in python is clf_opt = svm.SVC(C=10,gamma=0.001,kernel='linear',random_state=0, tol=1e-5) …
l have a dataset of dim=(200,2000) 200 examples and 2000 features. l have 10 classes. l used sklearn for both cases : svm.svc(kernel=linear) LinearSVC() However LinearSVC() performs drastically better than svm with linear kernel. 60% against 23%. l'm supposed to get the same or comparable results since they are fed with same parameters and data. What's wrong ? Thank you
I am trying do an Image Classification where each sample of training data contains data of the current pixel with the 8 surrounding ones. Where can I find examples of SVM, in python, that use 5 or more features in training?
I am using the SVM function provided by scikit-learn. I would like to know whether I need to perform standardization before fitting the model. As I know, LibSVM tends to require pre-processing the data. I am not sure whether scikit-learn automatically normalizes the data instead of expecting us to handle it ourselves.
I have the following code for Grid search, but it only return the accuracy result using 5 folds cross-validation. Is it possible to obtain standard deviation from the 5 folds CV. How would you do that? Thanks in advance. for i=1:numLog2c log2c = log2c_list(i); for j=1:numLog2g log2g = log2g_list(j); cmd = ['-q -v ', int2str(nFold), ' -c ', num2str(2^log2c), ' -g ', num2str(2^log2g),' ', svmCmd]; cv = svmtrain(trainLabel, trainData, cmd); cvMatrix(i,j) = cv; if(cv >= bestcv) bestcv= cv; bestLog2c = …
I am using $\chi^{2}$ kernel for non-linear SVM (using libSVM) for classifying MNIST digits. I am getting very bad performance (worse than random guessing). The $\chi^{2}$ kernel code (in MATLAB) is as follows: function D = chi2_kernel(X,Y) %Computes the chi2 kernel for two Matrices X and Y D = zeros(size(X,1),size(Y,1)); for i = 1:size(Y,1) d = bsxfun(@minus,X, Y(i,:)); s = bsxfun(@plus,X,Y(i,:)); D(:,i) = sum(d.^2 ./ (s/2),2); end D = exp(-0.0001.*(1-D)); end Using this I computed the kernel matrix as follows …
I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem. Last year, Liblinear was release, and it can solve performance bottleneck. But it cost too much memory. Is MapReduce the only way to solve semantic analysis problem on big data? Or are there any other methods that can improve memory bottleneck on Liblinear?