Deriving VIF equation from the matrix form of Least Squares equation
I have been working through the derivation of the formula used to calculate the Variance Inflation Factor associated with a model. I am hoping to start with the Least Squares equation as defined in matrix form and find a proof that derives this, linked here: derivation of VIF
I understand correlation is equal to cov/ $\hat{\sigma}^2$ and $VIF_{j}$ is the jth predictor is jth diagonal entry of inverse of correlation matrix. But how is this related to $VIF_{j}=\frac{Var({\hat{\beta_j}})}{\sigma^2}$ ?
I'd like to understand how to get from the matrix form of the least squares equation where $\hat{\beta}$ = $(X^TX)^{-1}(X^TY)$ to $VIF_j=\frac{Var({\hat{\beta_j}})}{\sigma^2}$???
Honestly, this is my main question and everything else is secondary. THANK YOU to anyone who can help me here!:)
From scanning multiple stack overflow questions, it seems that I can deduce that $X^TY$ = variance? And that $(X^TX)^{-1}$ is the correlation matrix or the standardized covariance matrix? (see highlighted portions #1 and #2). However, I'm not able to prove this or find information online that would lend insight on this issue.
I would really love if someone explained or gave insight on either question! Thank you so much!
Topic collinearity mathematics
Category Data Science