How does tree-based algorithms handle linearly combined features?

Question

How does tree-based algorithms handle linearly combined features?

thereandhere1

2021年1月27日 09:55

While I am aware that tree-based algorithms (e.g., DT, RF, XGBoost) are 'immune' to multi-collinearity, how do they handle linearly combined features? For example, is there is any additional value or harm in including the three feature: a, b and a+b in the model?

Topic collinearity linear-algebra xgboost decision-trees random-forest

Category Data Science

György Móra · Accepted Answer · 2021年1月27日 09:55

If the sum of the two feature makes sense on the domain semantically, it might be a good idea.

But while trees can handle redundant features pretty well, increasing the number of features without adding any extra "value" or "information" can lead to lower performance in certain situations. For example, if there is no added value and you aggressively sample or restrict the number of features in each tree the two related feature will take the place of a useful features when they are both selected.

If you think the sum is a better feature than the partitions you might consider addig the sum only, or the sum and one of the components.

How does tree-based algorithms handle linearly combined features?

About