Difference between Information Gain and Mutual Information for feature selection

What is the difference between information gain and mutual information?

At this point, I understand that information gain is calculated between a random variable and target class for classification while mutual information is calculated between two random variables.

Does mutual information become the same as information when it is calculated between a random variable and target class?

Topic mutual-information feature-selection

Category Data Science


Information Gain (IG) is the measure of entropy gained due to operations performed on an assigned dataset or random variable. Now, entropy here is just the variation in our dataset; so lesser the variation, lesser the entropy and in turn greater correlation among our dataset.

As you've hinted on a classification task, Mutual Information (MI) is a one-sided metric, whereas information gain is a two-sided metric. Actually both measure how relevant a feature is to a specific target class, & I've often observed various sources using the term information gain interchangeably with mutual information, as they are symmetric.

Important to note is while mutual information measures only the positive features, information gain measures both, negative as well as positive features of our data. For mathematical completeness, information gain of X given Y is given by IG(X|Y) = H(x) - H(X|Y), and mutual information I(X;Y) is given by I(X;Y) = sum_x sum_y P(X,Y) log {P(X,Y)/P(X)P(Y)}.

Following links might help grasp better understanding of these terms:

Hope it helps!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.