Mututal Information in sklearn
I expected sklearn's mutual_info_classif to give a value of 1 for the mutual information of a series of values with itself but instead I'm seeing results ranging between about 1.0 and 1.5. What am I doing wrong?
This video on mutual information (from 4:56 to 6:53) says that when one variable perfectly predicts another then the mutual information score should be log_2(2) = 1. However I do not get that result:
import pandas as pd
from sklearn.metrics import confusion_matrix
y = [1,1,1,1,1,0,0,0,0,0]
print(Confusion matrix:)
print(confusion_matrix(y,y))
print(Mutual information:)
result = mutual_info_classif(pd.DataFrame(y), y)
print(result)
which gives:
Confusion matrix:
[[5 0]
[0 5]]
Mutual information:
[1.28730159]
When the two variables are independent, I do however see the expected value of zero:
x = [1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1]
y = [1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0]
print(Confusion matrix:)
print(confusion_matrix(x,y))
print(Mutual information:)
result = mutual_info_classif(pd.DataFrame(x), y)
print(result)
which gives:
Confusion matrix:
[[5 5]
[5 5]]
Mutual information:
[0]
Why am I not seeing a value of 1 for the first case?
Topic mutual-information scikit-learn python
Category Data Science