I have 1000 data points from the bivariate normal distribution $\mathcal{N}$ with mean $(0,0)$ and variance $\sigma_1^2=\sigma_2^2=10$ with the covariances being $0$. Also there are 20 more points from another bivariate normal distibution with mean $(15,15)$ with variance $\sigma_1^2=\sigma_2^2=1$ and with the covariances being $0$ again. I used the least squares method to calculate the parameters of the decision bounday $\theta_0 + \theta_1 x_1 + \theta_2 x_2=0$, that is $$\theta = (X^T X)^{-1}(X^Ty)$$ where $y$ is a column matrix with …
Log of odds of the response variable being 1 has a linear relationship with the predictor variables. Hence, the log of odds is equal to the equation of a linear line. Is there any way to check the linearity between the response variable & predictor variables?
I was wondering if I can visualize with the example the fact that for all points $x$ on the separating hyperplane, the following equation holds true: $$w^T.x+w_0=0\quad\quad\quad \text{... equation (1)}$$ Here, $w$ is a weight vector and $w_0$ is a bias term (perpendicular distance of the separating hyperplane from the origin) defining separating hyperplane. I was trying to visualize in 2D space. In 2D, the separating hyperplane is nothing but the decision boundary. So, I took following example: $w=[1\quad 2], …
I read a lot , but still not able to get the following concepts -: (1) If a classifier is given, how do we know whether its a linear or non linear classifier? (Interested in step by step procedure to make a judgement of classifier) (2) If a classifier is linear then its decision boundary is linear (True or False ) (3)If a decision boundary is linear then its classifier is linear(True or Flase) Now, lets suppose we have to …
Is this dataset linearly separable? If not, can it be converted into one by applying some function as it seems to follow the same pattern? Also, which classification algorithms could be used to fit this dataset?
I want to understand the kernel selection rationale in SVM. Some basic things that I understand is if data is linear, then we must go for linear kernel and if it is non-linear, then others. But the question is how to understand that the given data is linear or not, especially when it has many features. I know that by cross validation I can try and feed different kernels and see the output whichever performs best to be selected, but …
In the following Linear Regression discussion I didn't understand a few things: So my questions are: In the third slide: What does this probability means $P\left(y_i|x_i\right)$ and accordingly what does it mean to maximize it ? Does it mean to maximize both $P\left(y_i=1|x_i\right)$ and $P\left(y_i=0|x_i\right)$, and as higher this probability, the more stable and rightful results we get, and accordingly the more correct weights $w^*$ we get ? In the fourth slide I don't see the math, could anyone detail …
I have a dataset which contains a lot of features (>>3). For computational reasons, I would like to apply a dimensionality reduction. At this point I could use different techniques: standard PCA Kernel PCA LLE ... My problem is to choose the right approach since the number of features is so high that I cannot know beforehand what the distribution of points is like. I could do it only if I have 3D data, but in my case I have …
For example, on the linear separability Wikipedia article, the following example is given: They say "The following example would need two straight lines and thus is not linearly separable". On the other hand, in Bishop's 'Pattern Recognition and Machine Learning' book, he says "Data sets whose classes can be separated exactly by linear decision surfaces are said to be linearly separable". Under Bishop's definition of linear separability, I think the Wikipedia example would be linearly separable, even though the author …