How exactly do Gaussian Processes (square dist kernel) enforce smoothness? (Aka how are they computed to do so?)

From: http://www.cs.cmu.edu/~16831-f12/notes/F10/16831_lecture22_jlisee/16831_lecture22.jlisee.pdf

"Gaussian Processes artificially introduce correlation between close samples in that vector in order to enforce some sort of smoothness on the succession of samples."

But how is this computed? Is the function f(x) ~ GP(mu,k(x,x')) performed incrementally? e.g. the n'th calculated value f(xn) uses values f(x-1)...f(x-n) to compute its mean and variance?

Topic gaussian-process

Category Data Science


But how is this computed?

The vector that is referred to is the vector of samples $V = (X_1, X_2, ..., X_n)$. In the following formula for calculating the probability of each sample $X_i$ $$p(X_i) = \frac{1}{z} \cdot e^{(X−\mu) \cdot Σ^{−1} \cdot (X−\mu)}$$ $\sum$ is the covariance-variance matrix in which the values in diameter are the variance of each sample point with itself and the other elements are the covariance of each sample with others. For instance, the element $M_{ij}$ is the covariance between sample i and sample j. We know that for calculating covariance: $$covariance (x , y) = \frac{1}{n} \sum_{i=1}^n (x_i - \mu_x)\cdot(y_i - \mu_y)$$ and covariance is at highest value when x =y and is higher when two samples are more similar to each other. Hence, this matrix $\sum$ shows the similarity between the samples. a threshold k is also multiplied by all the elements of this covariance matrix.

So if we have n data samples each of them having d dimensions, $X$ would be a n by d matrix, $mu$ would be a $n by 1$ (I'm not sure) matrix and $\sum$ would be a n by n matrix.

Is the function f(x) ~ GP(mu,k(x,x')) performed incrementally? e.g. the n'th calculated value f(xn) uses values f(x-1)...f(x-n) to compute its mean and variance?

No, it is not. we know that f(x) is a vector that: $$GP(mu, k(x, x')) = f(x) = [f(x_1), f(x_2),...,f(x_n)]^T$$ and each of these $f(x_i)$s is calculated singularly.


It's by definition, when you fit a guassian process you specify the mean function m(x) and the covariance function (or kernel) k(x,x'). Often the mean function is 0 and the covariance is the radial basis function or squared expential kernel which is smooth (in fact infinitly differentiable).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.