Does it make sense to use target encoding together with tree-based models?
I'm working on a regression problem with a few high-cardinality categorical features (Forecasting different items with a single model). Someone suggested to use target-encoding (mean/median of the target of each item) together with xgboost.
While I understand how this new feature would improve a linear model (or GMM'S in general) I do not understand how this approach would fit into a tree-based model (Regression Trees, Random Forest, Boosting).
Given the feature is used for splitting, items with a mean below and above the splitting value are separated. But since the final prediction is based on the mean of each leaf I do not see why this feature is helpful.
Therefore my question is whether it is sensible to use target encoding in this setting? If this is the case I would be thankful for a short explanation?
Topic target-encoding categorical-encoding xgboost random-forest
Category Data Science