When would one use Random Forest over SVM and vice versa? I understand that cross-validation and model comparison is an important aspect of choosing a model, but here I would like to learn more about rules of thumb and heuristics of the two methods. Can someone please explain the subtleties, strengths, and weaknesses of the classifiers as well as problems, which are best suited to each of them?
I am trying to predict the lithofacies, i.e. the rock type, from well log data, a project very similar to the one described in this tutorial. A well log can be seen as a 1D curve tracking how a given property (e.g. gamma radiation, electrical resistivity, etc...) varies as a function of depth. The idea is to use these 1D arrays as the input features to train a Machine Learning model (e.g. SVM or Random Forest), to infer the facies …
I am working on a problem where I need to predict the text corresponding to another text in my training data file. For example: if I have value like the software in one of my columns and another corresponding column holds a value adobe pdf for it then my algorithm should be able to predict the same for my test data as well. For example, if my test data has Tableau then the predicted category should be software corresponding to …
this is a screenshot of my code. i used abc.best_estimator_ (my GridSearchCV model) to find out best results. As you can see grid has values of C=1 and C=100 along with other values. abc.best_estimator_ says C=1 is the best value. For cross checking i tried using different values of c and here i'm getting a better score for C=100. I was getting similar results while finding gamma also, but later on i commented out gamma so as to focus on …
While using support vector machines (SVM), we encounter 3 types of lines (for a 2D case). One is the decision boundary and the other 2 are margins: Why do we use $+1$ and $-1$ as the values after the $=$ sign while writing the equations for the SVM margins? What's so special about $1$ in this case? For example, if $x$ and $y$ are two features then the decision boundary is: $ax+by+c=0$. Why are the two marginal boundaries represented as …
I'm trying to understand how to plot SVM hyperplane and its margins by this example: https://scikit-learn.org/stable/auto_examples/svm/plot_svm_margin.html And I got stuck at the plotting the parallels part: # plot the parallels to the separating hyperplane that pass through the # support vectors (margin away from hyperplane in direction # perpendicular to hyperplane). This is sqrt(1+a^2) away vertically in # 2-d. margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2)) yy_down = yy - np.sqrt(1 + a ** 2) * margin yy_up = yy …
I am trying to train a SVM classifier using scikit-learn.. At training time I want to reduce the feature vector dimension. I have used PCA to reduce the dimension. pp = PCA(n_components=400).fit(features) features = pp.transform(features) PCA requires m x n dataset to determine the variance. but at the time of inference I have only single image and corresponding 1d feature vector.. I am wondering how to reduce feature vector at inference time in order to match the training dimension. Or …
I am doing Covid-19 cases prediction using SVR, and getting negative values, while there should be no number of Covid-9 cases negative. Feature input that I was used is mobility factor (where have negative data) and daily cases of Covid-19. Kernel that I used is RBF kernel. Can anyone explain why I am getting negative values? are the independent variable (mobility) that I used influence that?
I know that you're supposed to scale your test data using the parameters (mean and stdev) from your training data. This is relatively simple; but what if the number of samples is limited in one training data set (e.g. Set A = 5 samples) so I want to combine two data sets (i.e. Set A + Set B = 10 samples) to have enough samples for training, what can I do so that I can scale/normalize the two sets into …
I am classifying about 3000 thousand people's faces using FaceNet. Each person has about 100 photos. FaceNet first calculates a face embedding ( a feature vector) for each photo. So each person has 100 face embeddings. What I want to do is aggregate the face embedding of each person into one. What is the best way of doing this? I have tried to use mean method. But I am not sure whether this is recommended way. -- The reason I …
To summarize the problem: I have a data set with ~1450 samples, 19 features and a binary outcome where classes are fairly balanced (0.51 to 0.49). I split the data into a train set and a test set using train_test_split(X, Y, test_size = 0.30, random_state = 42). I am using the train set to tune hyper-parameters in algorithms optimizing for specificity, using GridSearchCV, with a RepeatedStratifiedKFold (10 splits, 3 repeats) cross-validation, and scoring=make_scorer(recall_score, pos_label=0). I am then using the predictions …
I'm doing sentiment analysis of tweets related to recent acquisition of Twitter by Elon Musk. I have a corpus of 10 000 tweets and I'd like to use machine learning methods using models like SVM and Linear Regression. My question is, when I want to train the models, do I have to manually tag big portion of those 10 000 collected tweets with either positive or negative class to train the model correctly or can I use some other dataset …
I'm reading paper Twin Support Vector Machines for Pattern Classification by Jayadeva et al. (2007). In that paper, the authors proposed using two non-parallel hyperplanes for classifying two classes. The objective function for learning one hyperplane is: $$ \underset{w^{(1)}, b^{(1)}, q}{\text{Min}} \frac{1}{2} (Aw^{(1)} + e_1 b^{(1)})^T (Aw^{(1)} + e_1 b^{(1)}) + c_1 e_2 q^T \\ \text{subject to} -(B w^{(1)} + e_2 b^{(1)}) + q \ge e_2, q \ge 0 $$ , where $A$ and $B$ are data points belong to …
Tl;DR: You can predict something, but how do you explain the prediction? Your usual classification/regression setup Lets say the data is a classic regression/classification problem: several numerical columns, several nominal columns, and an event which we are trying to predict: user1, age:18, wealth:20000, likes:tomatoes, isInBigCity:yes, hasClicked:yes user2, age:25, wealth:24000, likes:carrots , isInBigCity:no , hasClicked:no ... With the help of Random Forests, SVM, Logistic Regression, Deep Neural Network, or some other method we export a model that can output a probability …
I am working on hybrid CNN-SVM for classification task, where I aim to use CNN for feature extraction and SVM for classification. So after the training of my CNN model as below: import os import numpy as np import matplotlib.pyplot as plt import tensorflow as tf import keras from keras.layers import Dense, Conv2D, InputLayer, Flatten, MaxPool2D (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() print('Training data: {}, {}'.format(x_train.shape, y_train.shape)) print('Test data: {}, {}'.format(x_test.shape, y_test.shape)) x_train, x_test = x_train / 255.0, x_test / …
I need to generate an equation for hyperplane, I have two independent variables and one binary dependent variable. Regarding this following equation for svm , $f(x)=sgn( sum_i alpha_i K(sv_i,x) + b )$ I have two independent variables (say P and Q) with 130 point values for each variable. I used svm radial basis function for binary classification (0 and 1) and I calculated for radial basis kernelized case, and now I have One column of 51 y (i) alpha (i) …
I am now searching for a long time on the internet and on papers for an answers of simple questions. Am I able to train a Support Vector Regression algorithm with different data sets? If yes, how is the approach called? I have 10 times the same battery with different usage, temperature and capacity. Usage and temperature are features (x_i,i) and capacity is the output (y_i,i). Battery_1 till timepoint n: [x_1,1 y_1,1; ... ;x_1,n y_1,n] ... Battery_10 till timepoint n: …
I have trained my multiclass SVM model for MNIST classification in Python using scikit-learn using the following code: from sklearn.svm import SVC from sklearn.model_selection import GridSearchCV parameters = {'kernel':['rbf'], 'C':[1, 10, 100, 1000], 'gamma':[1e-3, 1e-4]} clf = GridSearchCV(SVC(), parameters) clf.fit(xtrain, y_train) svmclf = clf.best_estimator_ svmclf.fit(xtrain, y_train) I wanted to get some parameters of the trained SVM: support vectors, alpha values and bias. So I tried this: SVs= clf.best_estimator_.support_vectors_ Alpha= clf.best_estimator_._dual_coef_ bias=clf.best_estimator_.intercept_ I checked the shape of these parameters and it …
I would like to run SVM for my classification problem using the Earth Mover's Distance (EMD) as a distance measurement. As I understood the documentation for Python scikit-learn (https://scikit-learn.org/stable/modules/svm.html#svm-kernels) it is possible to use custom kernel functions: import numpy as np from sklearn import svm def my_kernel(X, Y): return np.dot(X, Y.T) clf = svm.SVC(kernel=my_kernel) Also there is a package with EMD implemented (https://pypi.org/project/pyemd/). I tried to run it similar as in example using my own data (below). I have distributions …
I am currently struggling with finding an analytical solution for the $\alpha_k$. I have derived the following constrained optimization problem: $$ L = \sum_{i=1}^{N} \alpha_i - \frac{1}{2} \sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i \alpha_j y_i y_j (\textbf{x}_j^T \textbf{x}) $$ $$ s.t. \quad 0 \leq \alpha_i \leq C \quad \forall i, \quad \sum_{i=1}^{N} \alpha_i y_i = 0 $$ I had, at first, not taken the constraints into account which, after taking the derivative w.r.t. $\alpha_k$, gave me: $$ y_k \sum_{i=1}^{N} \alpha_i y_i (\textbf{x}_j^T \textbf{x}) = 1 …