Feature selection with information gain (KL divergence) and mutual information yields different results

Question

Feature selection with information gain (KL divergence) and mutual information yields different results

AutoMiner

2022年3月16日 19:01

I'm comparing different techniques for feature selection / feature ranking. Two of the techniques under scrutiny are the mutual information (MI) and the information gain (IG) as used in decision trees, i.e. the Kullback-Leibler divergence.

My data (class and features) is all binary.

All sources I could find state, that MI and IG are basically "two sides of the same coin", i.e. that one can be tranformed into the oher via mathematical manipulation. (For example [source 1, source 2])

Yet, when I rank my features using the two measures they do not result in the same ranking order. But if the two measures are equivalent, shouldn't the ranking be the same?

Can someone help me understand why the rankings are different?

Topic mutual-information information-theory ranking feature-selection

Category Data Science

Fred Guth · Accepted Answer · 2021年6月5日 10:49

1

Fred Guth answered at 2021年6月5日 10:49

Remind that $I[X;Y]$ is symmetrical, but $KL(P \Vert Q)$ is not. $I[X;Y] = KL(P(X,Y)\Vert P(X)P(Y))$. Take a look if you are not computing $KL(P(X)\Vert P(Y))$.

Feature selection with information gain (KL divergence) and mutual information yields different results

About