Feature selection for regression

Question

Feature selection for regression

ad123

2022年5月19日 02:02

Suppose I have a response variable y and and a set of feature variables (x1, x2 ... xn). I wish to find which of x1...xn are the best features for y in a regression problem (the relationship might not be linear).

Is there any way I can do this kind of feature selection without using any correlation measure or regression function in the process (i.e. I cannot use any filter or wrapper methods)?

Topic regression feature-selection

Category Data Science

spectre · Accepted Answer · 2021年12月5日 12:42

If you do not want to use filter or wrapper feature selection methods, then you can use tree based algorithms to find the feature importance of all the features. You can use Random Forest, LighGBM, XGBoost, CatBoost for this purpose. CatBoost is an interesting one as it can work with categorical features.

Also you can use L1 or L2 regularization to select the best features. Read up on Ridge and Lasso algorithms.

A word of caution though. Feature selection methods are to be taken with a pinch of sale. Never solely rely on feature selection methods. The best feature selection method IMO is filtering out features based on domain knowledge.

Jayaram Iyer · Accepted Answer · 2021年5月26日 14:17

1

Jayaram Iyer answered at 2021年5月26日 14:17

Google up scikit learn feature selection. You should look to use the f_regression and mutual_info_regression to identify the best features for the problem at hand.

SrJ · Accepted Answer · 2021年5月26日 12:37

1

SrJ answered at 2021年5月26日 12:37

You can train a LightGBM Regressor. LightGBM Regressor feature selection methods embbed in it. You can direct plot them to see which features are important. See this link. LightGBM Plot Importance

Feature selection for regression

About