How to plot a heatmap-like plot for categorical features?

I would greatly appreciate let me know how to plot a heatmap-like plot for categorical features?

In fact, based on this post, the association between categorical variables should be computed using Crammer's V. Therefore, I found the following code to plot it, but I don't know why he plotted it for "contribution", which is a numeric variable?

def cramers_corrected_stat(confusion_matrix):
    """ calculate Cramers V statistic for categorical-categorical association.
        uses correction from Bergsma and Wicher, 
        Journal of the Korean Statistical Society 42 (2013): 323-328
    """
    chi2 = ss.chi2_contingency(confusion_matrix)[0]
    n = confusion_matrix.sum().sum()
    phi2 = chi2/n
    r,k = confusion_matrix.shape
    phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))    
    rcorr = r - ((r-1)**2)/(n-1)
    kcorr = k - ((k-1)**2)/(n-1)
    return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))


cols = ["Party", "Vote", "contrib"]
corrM = np.zeros((len(cols),len(cols)))
# there's probably a nice pandas way to do this
for col1, col2 in itertools.combinations(cols, 2):
    idx1, idx2 = cols.index(col1), cols.index(col2)
    corrM[idx1, idx2] = cramers_corrected_stat(pd.crosstab(df[col1], df[col2]))
    corrM[idx2, idx1] = corrM[idx1, idx2]

corr = pd.DataFrame(corrM, index=cols, columns=cols)
fig, ax = plt.subplots(figsize=(7, 6))
ax = sns.heatmap(corr, annot=True, ax=ax); ax.set_title("Cramer V Correlation between Variables");

I also found Bokeh. However, I am not sure if it uses Crammer's V to plot the heatmap or not?

Really, I have two categorical features: the first one has 2 categories and the second one has 37 categories.

I need the plot will be like the two last plots presented here, but also display the association values on it too.
Thanks in advance.

Topic heatmap visualization python statistics categorical-data

Category Data Science


It might not be useful to plot the relationship between categorical features. The visualization would imply an ordering to categorical values which might not lead to incorrect interpretations.

A more useful option might be a contingency table. One feature would be in the rows, another feature would be in the columns. The cells would be the counts of co-occurrence.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.