VIF Vs Mutual Info
I was searching for the best ways for feature selection in a regression problem came across a post suggesting mutual info for regression, I tried the same on boston data set. The results were as follows:
# feature selection
f_selector = SelectKBest(score_func=mutual_info_regression, k='all')
# learning relationship from training data
f_selector.fit(X_train, y_train)
# transform train input data
X_train_fs = f_selector.transform(X_train)
# transform test input data
X_test_fs = f_selector.transform(X_test)
The scores were as follows:
Features Scores
12 LSTAT 0.651934
5 RM 0.591762
2 INDUS 0.532980
10 PTRATIO 0.490199
4 NOX 0.444421
9 TAX 0.362777
0 CRIM 0.335882
6 AGE 0.334989
7 DIS 0.308023
8 RAD 0.206662
1 ZN 0.197742
11 B 0.172348
3 CHAS 0.027097
I was just curious mapped the VIF along with scores I see that the features/Variables with high scores has a very high VIF.
Features Scores VIF_Factor
12 LSTAT 0.651934 11.102025
5 RM 0.591762 77.948283
2 INDUS 0.532980 14.485758
10 PTRATIO 0.490199 85.029547
4 NOX 0.444421 73.894947
9 TAX 0.362777 61.227274
0 CRIM 0.335882 2.100373
6 AGE 0.334989 21.386850
7 DIS 0.308023 14.699652
8 RAD 0.206662 15.167725
1 ZN 0.197742 2.844013
11 B 0.172348 20.104943
3 CHAS 0.027097 1.152952
How to select the best features among the list?
Topic variance mutual-information regression feature-selection
Category Data Science