How to compare 10000 data frames in Python?

I have 10000 data frames like this:

Each data frame corresponds to a different card game (and so different numbers in the table) and I want to compare these data frames all together. For example, I want to compare the heat maps of these data frames. Is there anyway to do this in Python? Is there any toolset so that I can compare all of them in one diagram or something like that? Because I want to see a trend in all of these 10000 data frames.

Topic heatmap dataframe python

Category Data Science


Simple approach:

  1. Define an appropriate distance or similarity measure between two such data frames. It's unlikely that there is a standard measure for whatever game this represents. For example you could have a distance measure which sums every value in each data frame and then return the absolute difference between the two values, but this is unlikely to correctly represent the semantic of the game.
  2. Implement this distance or similarity measure between two such data frames as a function
  3. Call this function for every possible pair of data frames (that's 49,995,000 comparisons, it's doable).

Fair warning: a heatmap of an unordered matrix 10k x 10k might not be very exploitable.

Note that once you have the distance/similarity function, you could also:

  • use a clustering algorithm to group the data frames by similarity
  • detect outliers

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.