Changes in the standard Heatmap plot - symmetric bar colors, show only diagonal values, and column names at x,y axis ticks

I have a heatmap image (correlation between all matrix columns) and I'm straggling to preform all the changes below within the same image:

  1. bar colors should be symmetric around zero (e.g., correlation of 1 and -1 should be with the same color)
  2. change the correlation matrix to a diagonal matrix, since correlation values are symmetric - and show only upper matrix triangle (mask out the lower triangle )
  3. show the correlation values in every cell of the diagonal matrix
  4. x,y axis ticks - show the column names (instead of a serial number)

This is the code:

def generate_heatmap(X):

    """
    Pearson Correlation Heatmap Plot

    :return:
    """
    print("Start Pearson Correlation Heatmap Plot  .. ", datetime.now())

    plt.figure(figsize=(10,8))
    plt.title('Pearson Correlation of miRNAs', y=1.05, size=15)

    # Correlation matrix for heatmap
    corr = np.corrcoef(X.transpose())

    plt.imshow(corr, cmap='BuPu', interpolation='nearest')
    plt.colorbar()
    plt.show()

Topic heatmap matplotlib correlation visualization python

Category Data Science


This is how I obtained the desired plot:

def generate_heatmap(X):

    """
    Pearson Correlation Heatmap Plot

    :return:
    """
    #from matplotlib import cm as CM
    from matplotlib.colors import LinearSegmentedColormap

    print("Start Pearson Correlation Heatmap Plot  .. ", datetime.now())

    # get column names
    cols = X.columns
    # define plot for heatmap
    fig, ax = plt.subplots(figsize=(16,16))

    # ------------------------------------------------------------
    # Correlation matrix for heatmap. the tranpose is because we want pxp matrix (rather a nxn)
    corr = np.corrcoef(X, rowvar=False)
    # show only upper matrix triangle - mask out the lower triangle of corr data
    corr = np.triu(corr, k=0) 

    # ------------------------------------------------------------
    # Edit graphics of the plot
    plt.title('Pearson Correlation of ' + str(len(cols)) + ' miRNAs', y=1.05, size=15, fontsize=32)

    # bar colors shold be symetric around zero!
    colors = [(1, 0, 0), 'w', (1, 0, 0)] 
    cm = LinearSegmentedColormap.from_list('heatmap', colors, N=20)    

    # ------------------------------------------------------------
    # Heatmap based on corr matrix we provided
    c = plt.pcolor(corr, edgecolors='w', linewidths=2, cmap=cm, vmin=-1.0, vmax=1.0)

    # ------------------------------------------------------------
    # Editing additional graphics of the plot (if not too big)
    if len(cols) < 50:

        # set axis label names
        ax.set_xticks(np.arange(len(cols)))
        ax.set_xticklabels(labels = cols, rotation=45, fontsize=12, ha='center')

        ax.set_yticks(np.arange(len(cols)))
        ax.set_yticklabels(labels = cols, rotation=45, fontsize=12)

        # show corr values in every cell
        for (i, j), z in np.ndenumerate(corr):
            # in the symetric values, don't annotate the cell with the corr value
            if (i > j): 
                continue
            else:
                # va and ha not working, we do +0.4 to overcome the centering of values 
                ax.text(j+0.4, i+0.4, '{:0.2f}'.format(z), ha='center', va='bottom', fontsize=11)

    plt.colorbar(c)
    plt.show()

#generate_heatmap(miRNA_data[selected_mir_columns])

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.