Feature importance by random forest and boosting tree when two features are heavy correlated

Question

Feature importance by random forest and boosting tree when two features are heavy correlated

user6703592

2021年11月7日 18:18

I have asked this question here but seems no one is interested in it.

Here is my understanding, pls correct me if there is any misunderstanding:

Tree models is used to select the importance of features by Mean decrease impurity(let's ignore Permutation):

https://blog.datadive.net/selecting-good-features-part-iii-random-forests/

but the tree has the weakness on heavy correlated features, where if one feature is split, the left one has no uncertainty to split, namely tree will only select one of two heavy correlated features (like LASSO).

I think the methodology and disadvantage of feature selection by random forest and boosting trees are all inherited from trees. But the difference is

random forest: each feature has the opportunity to first split in a separated tree. Therefore the importance will tend to be evenly distributed over each feature, which may lead both of features become not important (the importance is diluted).
boosting tree: the boosting can be approximately regarded as continuing to split in a tree (fit the residual), therefore it is most likely that only one feature will split only one time. As result, one of the features will be selected as important and the left one will be ignored.

In a summary, both random forest and boosting trees are not good ways to deal with heavy correlated features. As an improvement, some books mentioned Wrapper method like randomized sparse modelsandRecursive feature elimination` can be used to reduce the impact of correlation. But

Randomized sparse models: Random forest and boosting tree already have the feature (column) sampling, are two effects repeated?
Recursive feature elimination: Does it something like the stepwise regression?

Topic gradient-boosting-decision-trees decision-trees feature-selection machine-learning

Category Data Science

Carlos Mougan · Accepted Answer · 2021年11月7日 18:08

Randomized sparse models: Random forest and boosting tree already have the feature (column) sampling, are two effects repeated?

This question is unclear. Random Forest and GBDT are not randomized sparse models. Unless it is a nomenclature that I am not familiar about.

In the random forest doc,

max_features{“auto”, “sqrt”, “log2”}, int or float, default=”auto”
The number of features to consider when looking for the best split:

This is the feature selection of random forest, it also avoids overfitting. It also appears in GBDT

Recursive feature elimination: Does it something like the stepwise regression?

For the info about RFE you can have a look at sklearn docs

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through any specific attribute or callable. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

About your question, you are making some unjustified claims that are not true

Tree model are used to select the importance of features by Mean decrease impurity(let's ignore Permutation) -- Tree models are not used to select importance of features, but they are predictors. You can use them later to see what are the most important features that they have selected.

And more....