Gaussian Process for Classification: How to do predictions using MCMC methods
Problem
I was reading about Gaussian Processes for regression in the "Gaussian Processes for Classification" textbook and in a few other online resources. Everywhere I look people seem to avoid talking about one would go about doing this. Can anyone provide a simple answer to this?
Mathematics and Context
- $X\in\mathbb{R}^{n\times d}$ is a matrix whose rows ${\bf{x}}_i$ are the $n$ training observations living in $d$-dimensions.
- ${\bf{y}}$ is an $n$-dimensional vector containing training labels $0$ and $1$ for each training input.
- ${\bf{f}}$ is an n-dimensional vector whose elements $f_i={\bf{x}}_i^\top {\bf{w}}$ are the so-called linear predictors.
- ${\bf{x}}_*$ is a testing input.
Formulation of Gaussian Process for Classification
Inference is performed in two steps:
- Compute $$ p(f_*\mid X, {\bf{y}}, {\bf{x}}_*)=\int p(f_*\mid X, {\bf{x}}_*, {\bf{f}})p({\bf{f}}\mid X, {\bf{y}})d {\bf{f}} $$
- Squish the value using the sigmoid function to find the class probability. $$ \overline{\pi}_* = p(y_*=1\mid X, {\bf{y}}, {\bf{x}}_*) = \int \sigma(f_*)p(f_*\mid X, {\bf{y}}, {\bf{x}}_*)d f_* $$
What I don't understand
How does one go about solving this using sampling methods? My idea is that the first integral might be similar to an expectation so maybe we can do something like this.
- Get samples ${\bf{f}}_1, \ldots, {\bf{f}}_N$ from $p({\bf{f}}\mid X, {\bf{y}})$ and then approximate the first integral like this $$ \mathbb{E}_{p({\bf{f}}\mid X, {\bf{y}})}\left[p(f_*\mid X, {\bf{x}}_*, {\bf{f}})\right] \approx \frac{1}{N}\sum_{i=1}^N p(f_*\mid X, {\bf{x}}_*, {\bf{f}}_i) $$ but then how do I compute $p(f_*\mid X, {\bf{x}}_*, {\bf{f}}_i)$ ?
Topic gaussian-process sampling classification
Category Data Science