Feature importance by random forest and boosting tree when two features are heavy correlated
I have asked this question here but seems no one is interested in it.
Here is my understanding, pls correct me if there is any misunderstanding:
Tree models is used to select the importance of features by Mean decrease impurity
(let's ignore Permutation):
https://blog.datadive.net/selecting-good-features-part-iii-random-forests/
but the tree has the weakness on heavy correlated features, where if one feature is split, the left one has no uncertainty to split, namely tree will only select one of two heavy correlated features (like LASSO).
I think the methodology and disadvantage of feature selection by random forest and boosting trees are all inherited from trees. But the difference is
random forest: each feature has the opportunity to first split in a separated tree. Therefore the importance will tend to be evenly distributed over each feature, which may lead both of features become not important (the importance is diluted).
boosting tree: the boosting can be approximately regarded as continuing to split in a tree (fit the residual), therefore it is most likely that only one feature will split only one time. As result, one of the features will be selected as important and the left one will be ignored.
In a summary, both random forest and boosting trees are not good ways to deal with heavy correlated features. As an improvement, some books mentioned Wrapper method like
randomized sparse modelsand
Recursive feature elimination` can be used to reduce the impact of correlation. But
Randomized sparse models: Random forest and boosting tree already have the feature (column) sampling, are two effects repeated?
Recursive feature elimination: Does it something like the stepwise regression?