Pls. refer section 4.1.3 in Pattern Recognition - Bishop: "Least squares for Classification": In a 2 class Linear Discriminat system, we classified vector $\mathbf{x}$ as $\mathcal{C}_1$ if y($\bf{x}$)>0, and $\mathcal{C}_2$ otherwise. Generalizing in section 4.1.3, we define $\mathcal{K}$ linear discriminant equations - one for each class: $y_{k}(\bf{x}) = \bf{w_k^Tx} + \mathit{w_{k0}} \tag {4.13}$ adding a leading 1 to vector $\bf{x}$ yields $\tilde{\mathbf{x}}$. And the Linear Discriminant function for $\mathcal{K}$ class is given by: $\bf y(x) = \widetilde{W}^{T}\tilde{x}$. The author progresses …
I'm new to machine learning. I have the following scenario: I have five individuals that are each carrying an accelerometer. That sensor measures movement/acceleration on a scale from 0 to 255, 0 being no movement, 255 being max movement (at a 5-minutes interval). Some individuals carry sensors that are more sensitive, and some that are less sensitive. As such, some individuals' sensors will provide higher values, and some individuals' sensors will provide lower values, for the same movements. Using a …
I use DeLonge method to compare two ROC AUCS. The result of it is Z-score. Both ROC AUCs obtained from LDA (linear discriminant analysis) from sklearn package. The first one uses eigen solver inside LDA and the second one uses svd solver. The dotted line is my data. The red line is N(0, 1) Note: there is a minor jump at the point Z = 0. Z = 0 means that classifiers did their job equally. Z > 0 (Z …
I have asked this question in Mathematics Stackexchange, thought however that it might be more fit for here: I am currently taking a Data-Analysis course and I learned about both the terms LDA (Linear Discriminant Analysis) and FDA (Fisher's Discriminant Analysis). I almost have the feeling that they are used as somewhat of synonyms in some places, which obviously is not true. Can someone explain me how those approaches are related? Since LDA's aim is to reduce dimensionality while preserving …
Let success metric(for some business use case I am working on) be a continuous random variable S. The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure. I have data,let's call them X, which affects S. Let X also be modelled as bunch of random variables. P(S|X) changes based on …
I'm trying to implement the DDPmine algorithm from this article as part of some project, and I do not understand where in the algorithm we use the Class Label of each transaction? We have transactions from 2 different groups spouse group has a class label "0" and group b has the class label "1" and we want to find the Discriminative Patterns that are frequent in each group but not on the 2 groups combined but in which part of …
I'm looking for any actual working code implementation of the DDPMiner algorithm mentioned in the Direct Discriminative Pattern Mining for Effective Classification article form 2008 I'm having real trouble trying to implement it myself.
I found a post explaining the discriminant function very detailed. But I am still confused about the function $g(\mathbf{x})=\mathbf{w^Tx}+w_0$ in 9.2 Linear Discriminant Functions and Decision Surfaces. What does it represent graphically? Could anyone explain it, probably with figure 9.2? Does it mean the distance between the origin and the hyperplane?
The data set vaso in the robustbase library summarizes the vasoconstriction (or not) of subjects’ fingers along with their breathing volumes and rates. > head(vaso) Volume Rate Y 1 3.70 0.825 1 2 3.50 1.090 1 3 1.25 2.500 1 4 0.75 1.500 1 5 0.80 3.200 1 6 0.70 3.500 1 I want to perform a linear discriminant analysis in R to see how well these distinguish between the two groups. And I consider two cases: ld <- lda(Y …
I am trying to manually implement the irls logistic regression (Chapter 4.3.3 in Bishop - Pattern Recognition And Machine Learning) in python. For updating the weights, I am using $w' = w-(\Phi^TR\Phi)^{-1}\Phi^T(y-t)$ However I am not getting satisfying results, also my weights are growing unbounded in each iteration. I've written this code so far: def y(X, w): return sigmoid(X.dot(w)) def R(y): R = np.identity(y.size) R = R*(y*(1-y)) return R def irls(X, t): w = np.ones(X.shape[1]) w = w.reshape(w.size, 1) t …
My task consists of two points: 1) Make data clustering; 2) Assign new data to the resulting clusters; I wanted to highlight the boundaries of clusters as min/max values for each coordinate of an observation belonging to the cluster, then assign observations from the new data to a particular cluster in accordance with its boundaries. However, the problem is that cluster boundaries intersect and each observation can belong to several clusters. What adequate methods can be used to discriminate data …
We might reduce the problem to $c$ two-class problems, where the $i^{th}$ problem is solved by a linear discriminant function that separates points assigned to $w_i$ from those not assigned to $w_1$. A more extravagant approach would be to use $\frac{c(c-1)}{2}$ linear discriminants, one for every pair of classes. For these two methods, I don't understand how they divide the area. If we remove all the lines in Figure 9.3, how should we start?
Please refer "Pattern Recognition and Machine Learning" - Bishop, page 182. I am struggling to visualize the intuition behind equations 4.6 & 4.7. I am presenting my understanding of section 4.1.1 using the diagram: Pls. Note: I have used $x_{\perp}$ and $x_{p}$ interchangeably. Equations 4.6, 4.7 from book: $$\mathbf{x} = \mathbf{x_{\perp}} + \textit{r}\mathbf{\frac{w}{\Vert{w}\Vert}} \tag {4.6}$$ Muiltiplying both sides of this result by $\mathbf{w^{T}}$ and adding $w_{0}$, and making use of $y(\mathbf{x}) = \mathbf{w^{T}x} + w_{0}$ and $y(\mathbf{x_{\perp}}) = \mathbf{w^{T}x_{\perp}} + …
My attempt: (a) I solved that $a=\ln{\frac{P(X|C_0)P(C_0)}{P(X|C_1)P(C_1)}}$ (b) Here is where I'm running into trouble. I'm plugging the distributions into $\ln{\frac{P(X|C_0)P(C_0)}{P(X|C_1)P(C_1)}}$ and I get $a=\ln{\frac{P(C_0)}{P(C_1)}}+\frac{1}{2}(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)-\frac{1}{2}(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)$. I can see that $b=\ln{\frac{P(C_0)}{P(C_1)}}$ and $w^Tx=\frac{1}{2}(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)-\frac{1}{2}(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)$. I'm not sure how to simplify $w^Tx$ so that I can solve for $w$. Or is there something that I did wrong?
To classify my samples, I decided to use Naive Bayes classifier, but I coded it, not used built-in library functions. If I use this equality, I obtain nice classification accuracy: p1(x) > p2(x) => x belongs to C1 However, I could not understand why discriminant functions produce negative values. If they are probability functions, I think they must generate a value between 0 and 1. Is there anyone who can explain the reason ?
How is the performance of Fischer projection compared to other LDA methods of dimension reduction? I thought that Fischer projection was a great method of dimension reduction by maximizing class separation, but when I looked at the LDA methods in scikit learn, Fischer projection wasn't even in the list. This got me thinking, is it any good compared to the other methods out there? Edit (answer): My bad, they are the same. Fischer projection is a 2 class special case …
I'm following a Linear Discriminant Analysis tutorial from here for dimensionality reduction. After working through the tutorial (did the PCA part, too), I shortened the code using sklearn modules where applicable and verified it on the Iris data set (same code, same result), a synthetic data set (with make_classification) and the sklearn-digits dataset. However, then I tried the exact same code on a complete different (unfortunately non-public) data set that contains spectra recordings of two classes. The LDA crashes at …
I have been using the LDA package for R, but it is missing quite a few features especially those that can assess the output. Are the any preferred packages that have some of the following? Univariate Test Statistics Canonical Analysis Multivariate Statistics like Wilk's lambda
This stack exchange post - https://stats.stackexchange.com/questions/80507/what-is-a-gaussian-discriminant-analysis-gda - discusses GDA, a machine learning method for classification. I would like to implement something like this analysis in R. I've seen posts for discriminant analysis in R using linear and quadratic discriminant analysis (http://rstudio-pubs-static.s3.amazonaws.com/35817_2552e05f1d4e4db8ba87b334101a43da.html) but nothing for GDA? I guess my questions are, because I'm not 100% familiar with these methods, (1) whether GDA is different from or a subset of LDA and QDA, and if there is an R package / functions …
For nonlinear data, when we are using Support Vector Machines, we can use kernels such as Gaussian RBF, Polynomial, etc to achieve linearity in a different (potentially unknown to us) feature space and the algorithm learns a maximal separating hyperplane in that feature space. My question is how do we create heatmaps such as the one seen in the image below to show this max. separating hyperplane in our original space and how should it be interpreted?