Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?

I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities?

Let's say our model is two trees. A terminal node of tree 1 has example 14 in a node with 20% class 1, 60% class 2, and 20% class 3. A terminal node of tree 2 has example 14 in a node with 100% class 2. Then our prediction for training example 14 is [10%, 80%, 10%].

Why use Softmax instead of this averaging approach?

Note: I am looking to apply this knowledge to understanding xgboost better, as well as a simple 1 tree classification model.

Topic softmax xgboost decision-trees

Category Data Science


Depending on the parameters you used for your model, it may not be calibrated in probabilities. That is, your model output a score, that is helpfull to give a relative order between your instance, but the score may not reflect the real % chance of the output happening.

Softmax, will at least garanty that your output are between 0 and 1 and sum to one. This will give you an output that is nearer to a probability (but that may not be enough to be calibrated to 'historical' probabilities).

PS: I don't think there is a link with xgboost, unless you have very specific options.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.