Distance Metric between 2 lists of sets

I have 2 list of of sets and I want to calculate a distance.

set1 = [
  {'A', 'B', 'C'},
  {'A', 'D', 'X'},
  {'X', 'A'}
]

set2 = [
  {'A', 'B', 'C', 'D'},
  {'A', 'X'},
  {'X', 'A', 'B'}
]

So if the set of sets are equal I want the distance to be 0, and if unequal then I want the distance to be higher than 0.

The exact distance doesn't really matter as I'll ultimately be aggregating to compare multiple approaches to predicting this list of sets, so I really just need a relative distance.

My initial thought was a sum of Jaccard Distances, but I'm not sure how that would turn out.

Topic jaccard-coefficient distance

Category Data Science


Update

For pairwise comparison calculate each Jaccard distance and take the norm.

from numpy.linalg import norm

norm([ 1 - len(set.intersection(*p)) / len(set.union(*p)) for p in zip(set1,set2) ])
0.5335936864527374

OP

You can calculate the Jaccard distance.

With set1 and set2 in OP then

sc = list(map(lambda st: { ''.join(s) for s in st }, [set1, set2]))
1 - len(set.intersection(*sc)) / len(set.union(*sc))
0.8

Hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.