VIF Vs Mutual Info

I was searching for the best ways for feature selection in a regression problem came across a post suggesting mutual info for regression, I tried the same on boston data set. The results were as follows:

# feature selection
f_selector = SelectKBest(score_func=mutual_info_regression, k='all')

# learning relationship from training data
f_selector.fit(X_train, y_train)

# transform train input data
X_train_fs = f_selector.transform(X_train)

# transform test input data
X_test_fs = f_selector.transform(X_test)

The scores were as follows:

Features    Scores
12  LSTAT   0.651934
5   RM  0.591762
2   INDUS   0.532980
10  PTRATIO 0.490199
4   NOX 0.444421
9   TAX 0.362777
0   CRIM    0.335882
6   AGE 0.334989
7   DIS 0.308023
8   RAD 0.206662
1   ZN  0.197742
11  B   0.172348
3   CHAS    0.027097

I was just curious mapped the VIF along with scores I see that the features/Variables with high scores has a very high VIF.

Features    Scores  VIF_Factor
12  LSTAT   0.651934    11.102025
5   RM  0.591762    77.948283
2   INDUS   0.532980    14.485758
10  PTRATIO 0.490199    85.029547
4   NOX 0.444421    73.894947
9   TAX 0.362777    61.227274
0   CRIM    0.335882    2.100373
6   AGE 0.334989    21.386850
7   DIS 0.308023    14.699652
8   RAD 0.206662    15.167725
1   ZN  0.197742    2.844013
11  B   0.172348    20.104943
3   CHAS    0.027097    1.152952

How to select the best features among the list?

Topic variance mutual-information regression feature-selection

Category Data Science


There is not an optimal answear to this question, however let me shade some lights on how I have previously used these methods.

I work with large amount of features and with different type of models. As you may already know there are linear and non linear models. Linear models perform well when feature selection is done on the most basic level. However models like random forest or even xgboost give you the opporunity to let more features in.

I have used both at the same time as steps. Apply Kbest. If I still have a lot of features, apply VIF to reduce even more. From experience, when using Boosted models, you don't really need VIF since these type of models know how to deal with multi-colinearity. I apply VIF only when I use linear regression models. It really depends on what model you are going to feed those feautures.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.