Deriving VIF equation from the matrix form of Least Squares equation

I have been working through the derivation of the formula used to calculate the Variance Inflation Factor associated with a model. I am hoping to start with the Least Squares equation as defined in matrix form and find a proof that derives this, linked here: derivation of VIF

I understand correlation is equal to cov/ $\hat{\sigma}^2$ and $VIF_{j}$ is the jth predictor is jth diagonal entry of inverse of correlation matrix. But how is this related to $VIF_{j}=\frac{Var({\hat{\beta_j}})}{\sigma^2}$ ?

I'd like to understand how to get from the matrix form of the least squares equation where $\hat{\beta}$ = $(X^TX)^{-1}(X^TY)$ to $VIF_j=\frac{Var({\hat{\beta_j}})}{\sigma^2}$???

Honestly, this is my main question and everything else is secondary. THANK YOU to anyone who can help me here!:)

From scanning multiple stack overflow questions, it seems that I can deduce that $X^TY$ = variance? And that $(X^TX)^{-1}$ is the correlation matrix or the standardized covariance matrix? (see highlighted portions #1 and #2). However, I'm not able to prove this or find information online that would lend insight on this issue.

I would really love if someone explained or gave insight on either question! Thank you so much!

Topic collinearity mathematics

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.