Does it make sense to use target encoding together with tree-based models?

Question

Does it make sense to use target encoding together with tree-based models?

KJA

2022年5月17日 13:28

I'm working on a regression problem with a few high-cardinality categorical features (Forecasting different items with a single model). Someone suggested to use target-encoding (mean/median of the target of each item) together with xgboost.

While I understand how this new feature would improve a linear model (or GMM'S in general) I do not understand how this approach would fit into a tree-based model (Regression Trees, Random Forest, Boosting).

Given the feature is used for splitting, items with a mean below and above the splitting value are separated. But since the final prediction is based on the mean of each leaf I do not see why this feature is helpful.

Therefore my question is whether it is sensible to use target encoding in this setting? If this is the case I would be thankful for a short explanation?

Topic target-encoding categorical-encoding xgboost random-forest

Category Data Science

Does it make sense to use target encoding together with tree-based models?

About