Decision trees are often used while implementing machine learning algorithms. The hierarchical structure of a decision tree leads us to the final outcome by traversing through the nodes of the tree. In simple, the 'knowledge' learned by a decision tree through training is directly formulated into a hierarchical structure. This structure holds and displays the knowledge in such a way that it can easily be understood, even by non-experts.
Each node consists of an attribute or feature which is further split into more nodes as we move down the tree. But how do we decide:
- Which attribute/feature should be placed at the root node?
- Which features will act as internal nodes or leaf nodes?
To decide this, and how to split the tree, we use splitting measures like Gini Index, Information Gain, etc.
Gini Index
The Gini index, or Gini coefficient, or Gini impurity computes the degree of probability of a specific variable that is wrongly being classified when chosen randomly and a variation of the Gini coefficient. It works on categorical variables, provides outcomes either be "successful" or "failure" and hence conducts binary splitting only. It isn't as computationally intensive as its counterpart – Information Gain. From the Gini Index, the value of another parameter named Gini Gain is calculated whose value is maximized with each iteration by the Decision Tree to get the perfect CART
The degree of the Gini index varies from 0 to 1,
- Where 0 depicts that all the elements be allied to a certain class, or only one class exists there.
- The Gini index of value as 1 signifies that all the elements are randomly distributed across various classes, and
- A value of 0.5 denotes the elements that are uniformly distributed into some classes.
It was proposed by Leo Breiman in 1984 as an impurity measure for decision tree learning
Mathematically, The Gini Index is represented by
![enter image description here](https://i.stack.imgur.com/imln1.png)
where,
- $C$ is the total number of classes,
- $p(i)$ is the probability of picking the data point with the class $i$
While building the decision tree, we would prefer choosing the attribute/feature with the least Gini index as the root node.
The Gini index is used in the classic CART algorithm and is very easy to calculate.
Gini Index:
for each branch in split:
Calculate percent branch represents #Used for weighting
for each class in branch:
Calculate probability of class in the given branch.
Square the class probability.
Sum the squared class probabilities.
Subtract the sum from 1. #This is the Ginin Index for branch
Weight each branch based on the baseline probability.
Sum the weighted gini index for each split.
Implementation in python, R
References: