Neural Network for solving these linear algebra problems

Intro There are several questions on this site about whether or not machine learning can solve specific problems. The answer (in my words) seems to be: "Yes, trivially, if you choose a model to learn your specific problem, but you sometimes may choose a model that can't represent/approximate the correct hypothesis." I would like to choose a neural network model where, a priori, all I know is that the input is a "linear algebra" kind of function. The Problem I …
Category: Data Science

Hypothesis vs Hyperplane in Machine Learning

I am finding it hard to understand the clear difference between Hypothesis and Hyperplane. I know that Hypothesis is a candidate model that maps inputs to outputs after training. And, Hyperplane is the decision boundary in a classification algorithm. But, I can't seem to understand how the two are differentiated in equations. Can someone help me understand their differences in equations with some visualizations?
Category: Data Science

How to incorporate the uncertainty of the model coefficients in the prediction interval of a multiple linear regression

I'm dealing with modeling small experimental data sets. As most experimental work does not generate thousands of samples, but rather a handful, I need to be inventive about how to deal with this small number of data sets (say 10-20). I've been building a nice framework to do just this, and at this point, I am interested in generating error bars with the predicted values. In a rough outline, this is what happens in the framework (e.g. when applying a …
Category: Data Science

Can I use regression to solve a multiple equation problem

I'm working on a problem which is a multiple equation. I have a group of people and each person in the group is working on different tasks (e.g. n tasks in total). Each person in this group is working on multiple tasks and complete them. I'd like to find an estimation for the time each type of task takes. I have equations like below: #of days person i worked = time(task1) * #task of type 1 completed + time(task2) * …
Category: Data Science

Dot product and linear regression

I'm studying PCA and my professor said something about finding the linear regression by doing the dot product of both axis. Could someone explain to me why? The dot product returns a number. What's the relationship between that number and the linear regression? In my example, I have two vectors $stat\_grade = [0,1,3,7,10]$ $physics\_grade = [1,5,8,10,10]$ The first step is normalizing them: $ \frac{stat\_grade - mean(stat\_grade)}{std(stat\_grade)} = [-1.69131435 -0.52489066 0.34992711 0.93313895 0.93313895]$ $ \frac{physics\_grade - mean(physics\_grade)}{std(physics\_grade)} = [-1.11613741 -0.85039041 -0.3188964 …
Category: Data Science

Gradient descent formula implementation in python

So I recently started with Andrew Ng's ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model. $$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$ As we see, the formula asks us to the sum over all the rows in data. However, the below code doesn't work if I apply np.sum() def gradientDescent(X, y, theta, alpha, num_iters): …
Category: Data Science

Linear regression with a fixed intercept and everything is in log

I have a set of values for a surface (in pixels) that becomes bigger over time (exponentially). The surface consists of cells that divide over time. After doing some modelling, I came up with the following formula: $$S(t)=S_{initial}2^{t/a_d},$$ where $a_d$ is the age at which the cell divides. $S_{initial}$ is known. I am trying to estimate $a_d$. I simply tried the $\chi^2$ test: # Range of ages of division. a_range = np.linspace(1, 500, 100) # Set up an empty vector …
Category: Data Science

Backpropagation with a different sized training set?

I'm trying to create a NN whose input is a (length m) array of 3d vectors $$\vec{x}_i = [x_{i,1},x_{i,2},x_{i,3}], \hspace{5mm}i=1:m $$ and whose output is a similarly sized array: $$\vec{h}_{\theta,i} = [h_{\theta,i1},h_{\theta,i2},h_{\theta,i3}], \hspace{5mm}i=1:m $$ BUT, my only training data is not 3d vectors but rather the magnitude/norm of such vectors (with no knowledge of the vector components ($\lambda's$) themselves): $$y_i= ||[\lambda_{i,1},\lambda_{i,2},\lambda_{i,3}]||, \hspace{5mm}i=1:m $$ So, my concept is to use the cost function: $$ J = \frac{1}{2m}\sum (||\vec{h}_{\theta,i}|| - ||y_i||)^2 $$ …
Category: Data Science

Nearest neighbor face recognition in eigenspace when using dot product of test set with eigenvectors does not match the performance when using sklearn

I am trying to perform Face recognition using PCA (eigenfaces). I have a set of N training images (of dimensions M=wxh), which I have pre-processed into a vertical stack of grayscale intensity vectors, a matrix of dimensions NxM. For the facial recognition, I am finding the single nearest neighbour of each test image in both the high-dimensional pixel space and the lower dimensional eigenspace. I am using NearestNeighbor classifier from sklearn. For recognition in the eigenspace, I am contrasting different …
Category: Data Science

Why linear model cannot understand the interaction between any two input features?

The book Deep Learning by Ian Goodfellow states that: Linear models also have the obvious defect that the model capacity is limited to linear functions, so the model cannot understand the interaction between any two input variables. What is meant by "interaction between variables" How do non linear models find it? Would be great if someone can give an intuitive/graphical/geometrical explanation.
Category: Data Science

NCHW input matrix to Dm conversion logic for convolution in cuDNN

I have been trying to understand the convolution lowering operation shown in the cuDNN paper. I was able to understand most of it by reading through and mapping various parameters to the image below. However, I am unable to understand how the original input data (NCHW) was converted into the Dm matrix shown in red. The ordering of the elements of the Dm matrix does not make sense. Can someone please explain this?
Category: Data Science

Pseudo inverse of the covariance matrix?

I've been looking for methods to compute a pseudo inverse of a covariance matrix. And found that one way is to construct a regularized inverse matrix. By constructing the eigen system, and removing the least significant eigenvalues and then use the eigen values and vectors to form an approximate inverse. Could anyone explain the idea behind this? Thanks in advance
Category: Data Science

Why transpose of independent feature matrix is necessary in case of linear regression?

I can follow classical linear regression steps: $Xw=y$ $X^{-1}Xw=X^{-1}y$ $Iw=X^{-1}y$ $w=X^{-1}y$ However, on implementing in Python, I see that instead of simply using w = inv(X).dot(y) they apply w = inv(X.T.dot(X)).dot(X.T).dot(y) What is the explanation of the transpositions and the two times multiplication here? I'm confused...
Category: Data Science

How is image convolution actually implemented in deep learning libraries using simple linear algebra?

As a clarifier, I want to implement cross-correlation, but the machine learning literature keeps referring to it as convolution so I will stick with it. I am trying to implement image convolution using linear algebra. After looking around on the internet and thinking about it, I could come up with two possible solutions for that. The first one: Create an appropriate Toeplitz-like matrix out of the kernel as it is described here. The second one: Instead of the filter, modify …
Category: Data Science

Difference between FDA and LDA

I have asked this question in Mathematics Stackexchange, thought however that it might be more fit for here: I am currently taking a Data-Analysis course and I learned about both the terms LDA (Linear Discriminant Analysis) and FDA (Fisher's Discriminant Analysis). I almost have the feeling that they are used as somewhat of synonyms in some places, which obviously is not true. Can someone explain me how those approaches are related? Since LDA's aim is to reduce dimensionality while preserving …
Category: Data Science

Is Regression Line an 1-D affine subspace of 2-D vector space?

Background I currently read a book called "Mathematics for Machine Learning" and I read chapter 2 which is about Linear Algebra, especially on subchapter 2.8 which is about Affine Space. The thing is, I learned from the book that affine subspaces are points, lines, and plane in $ \mathbb{R}^{3} $, which don't (necessarily) go through the origin. The affine subspace is defined as $$ L = x_{0} + \lambda b_{1} $$ where: $L$ is affine subspace $x_{0}$ is a support …
Category: Data Science

Understanding Lagrangian equation for SVM

I was trying to understand Lagrangian from SVM section of Andrew Ng's Stanford CS229 course notes. On page 17 and 18, he says: Given the problem $$\begin{align} min_w & \quad f(w) \\ s.t. & \quad h_i(w)=0, i=1,...,l \end{align}$$, the Lagrangian can be given as follows: $$\mathcal{L}(w,\beta)=f(w)\color{red}{+}\sum_{i=1}^l\beta_ih_i(w)\quad\quad\quad \text{...equation(1)}$$ Here, the $\beta_i$'s are Lagrange multipliers. While referring to Lagrange multipliers from Khan academy aryicle, I found it says: Lagrangian is given as: $$ \mathcal{L}(x,y,…,λ)=f(x,y,…)\color{red}{−}λ(g(x,y,…)−c) \quad\quad\quad \text{...equation(2)}$$ Here, $g$ is a constraint and …
Category: Data Science

Understanding SVM mathematics

I was referring SVM section of Andrew Ng's course notes for Stanford CS229 Machine Learning course. On pages 14 and 15, he says: Consider the picture below: How can we find the value of $\gamma^{(i)}$? Well, $w/\Vert w\Vert$ is a unit-length vector pointing in the same direction as $w$. Since, point $A$ represents $x^{(i)}$, we therefore find that the point $B$ is given by $x^{(i)} − \gamma^{(i)}·w/\Vert w\Vert$. But this point lies on the decision boundary, and all points $x$ …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.