Confidence intervals in multivariate linear regression

I am fitting my data to a multivariate linear regression $Y = BX + \Xi$, where the response is bivariate $Y\in R^{n\times 2}$, and the predictor is uni-variate but elevated to the projective plane to account for the intercept $X\in R^{n\times 2}$.

Now, finding the best fit reduces to $\hat B = (X^T X)^{-1}X^T Y$.

But I am interested in finding a $0.7$ confidence region around $\hat B$. How do I do that?

Topic multivariate-distribution linear-regression regression

Category Data Science


You could construct a Bayesian linear regression model to find the posterior $p(\theta\mid\mathcal{D})$ (where $\theta$ is the model parameters) and report the credible interval you're interested in, given the dataset $\mathcal{D} := \{ (x_i, y_i) \mid i = 1, 2, .., n \}$, where $x_i \in \mathbb{R}$ and $y_i \in \mathbb{R}^2$.

We will fit one regressor per target (aka two models given that our output is two dimensional)

Linear model forumlation

There are of course many options for choosing the underlying likelihood and priors of our model, but for clarity we will go for simple linear regression with both Gaussian likelihood and prior.

Likelihood: $$ p(y_{ij} \mid x_i ,\theta_j) = \mathcal{N}(\theta_{j_0} + \theta_{j_1}x_i, \sigma_j) $$

Priors:

$$ \theta_j \sim \mathcal{N}(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, I) $$ $$ \sigma \sim \text{HalfNormal}(10) $$

Posterior: $$ p(\theta_j \mid \mathcal{D}) \propto \prod_{i=1}^{n}p(y_{ij} \mid x_i ,\theta_j) \ p(\theta_j)$$

which is the target of your analysis, knowing that you need to report the $0.7$ credible interval of $\theta_j$

If you're using Python, this blog post illustrates how to build Bayesian linear regression model using pymc3.


Bayesian linear regression can provide an estimate for the confidence region for a linear regression estimate.


Looking at https://en.wikipedia.org/wiki/Simple_linear_regression :

This t-value has a Student's t-distribution with $n-2$ degrees of freedom. Using it we can construct a confidence interval for $\beta$:

$$ \beta \in \left[\widehat\beta - s_{\widehat\beta} t^*_{n - 2},\ \widehat\beta + s_{\widehat\beta} t^*_{n - 2}\right] $$

at confidence level $1-\gamma$, where $t^*_{n - 2}$ is the $(1-\frac{\gamma}{2})$-th quantile of the $t_{n−2}$ distribution.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.