Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?
I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities?
Let's say our model is two trees. A terminal node of tree 1 has example 14 in a node with 20% class 1, 60% class 2, and 20% class 3. A terminal node of tree 2 has example 14 in a node with 100% class 2. Then our prediction for training example 14 is [10%, 80%, 10%].
Why use Softmax instead of this averaging approach?
Note: I am looking to apply this knowledge to understanding xgboost better, as well as a simple 1 tree classification model.
Topic softmax xgboost decision-trees
Category Data Science