Decision tree Why is Gini index only used for binary choices?

Question

Decision tree Why is Gini index only used for binary choices?

Edouard99

2021年10月23日 13:12

I would like to understand why Gini index operates on the categorical target variables in terms of “success” or “failure” and performs only binary split ? Why it would not be possible to have 3 decision after a split when we are using the Giny impurity to select an attribute ? source : https://medium.com/analytics-steps/understanding-the-gini-index-and-information-gain-in-decision-trees-ab4720518ba8 and this is not the only ressource saying that.

Topic gini-index decision-trees classification machine-learning

Category Data Science

Vladislav Gladkikh · Accepted Answer · 2021年10月23日 13:12

Here is a good explanation of Gini impurity: link. I don't see why it can't be generalized to multinary splits.

The binary split is the easiest thing to do (e.g. discussion: link). That's why it is implemented in mainstream frameworks and described in countless blog posts.

A non-binary split is equivalent to a sequence of binary splits (e.g. link). However, this makes the tree complicated. Furthermore, a particular tree learning algorithm applied to a particular dataset might not find the representation of a non-binary split via the smallest possible number of binary splits. This will make the tree even more complicated and less interpretable.

Non-binary splits may reflect the structure of the data better. There are publications on them (e.g. link and link) but if you want to use trees with non-binary splits, you will probably not find frameworks where they are implemented in one line of code, and will have to write the code for them from scratch (if you succeed, please publish it and put a link here in a comment - I would be interested).

Decision tree Why is Gini index only used for binary choices?

About