How to interpret Variance Inflation Factor (VIF) results?
From various books and blog posts, I understood that the Variance Inflation Factor (VIF) is used to calculate collinearity. They say that VIF till 10 is good. But I have a question.
As we can see in the below output, the rad feature has the highest VIF and the norm is that VIF till 10 is okay.
How does VIF calculate collinearity when we are passing an entire linear fit to the function? And how to interpret the results given by VIF? Which variables are collinear with which variables?
lm.fit2 = lm(medv~.+log(lstat)-age-indus-lstat, data=Boston)
summary(lm.fit2)
Call:
lm(formula = medv ~ . + log(lstat) - age - indus - lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.3764 -2.5604 -0.3867 1.8456 25.2255
Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 53.942455 4.823309 11.184 2e-16 ***
crim -0.126273 0.029185 -4.327 1.83e-05 ***
zn 0.021993 0.012238 1.797 0.072934 .
chas 2.270669 0.768911 2.953 0.003296 **
nox -13.959428 3.187365 -4.380 1.45e-05 ***
rm 2.619831 0.378737 6.917 1.43e-11 ***
dis -1.374045 0.166350 -8.260 1.35e-15 ***
rad 0.286993 0.057004 5.035 6.72e-07 ***
tax -0.010756 0.003033 -3.546 0.000428 ***
ptratio -0.840540 0.116431 -7.219 1.99e-12 ***
black 0.008015 0.002402 3.336 0.000913 ***
log(lstat) -8.672865 0.530188 -16.358 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.258 on 494 degrees of freedom
Multiple R-squared: 0.7904, Adjusted R-squared: 0.7857
F-statistic: 169.3 on 11 and 494 DF, p-value: 2.2e-16
vif(lm.fit2)
crim zn chas nox rm dis
1.755719 2.269767 1.062622 3.800515 1.972845 3.418391
rad tax ptratio black log(lstat)
6.863674 7.279426 1.770146 1.340023 2.827687
Topic collinearity linear-regression feature-selection r machine-learning
Category Data Science