MaxDiff Histogram usage

Question

MaxDiff Histogram usage

Evan Gertis

2021年10月16日 13:18

I would like to understand how the MaxDiff algorithm works. The only example that I could find online was a set of slides by Chris Crifton.

Given An attribute C 12, 5, 5, 5, 12, 1, 1, 20, 20, 2, 2, 12, 20, 14, 20, 5, 5, 20, 15, 15, 2, 2, 1, 3, 3,5, 5, 5 I'd like to Identify the range and frequency of each bucket creating a Max-Diff histogram for C using beta=7.

I would be able to figure this out if there were more examples online. Does anyone know where I could find an example? I would appreciate help trying to solve this. The first step is sorting the data:

1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 12 12 12 14 15 15 20 20 20 20 20

The bucket boundary is (max-min)/2 = (20-1)/2 = (19)/2 = 9.5

1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 20 20 20 20 20

Would I calculate the next bin boundaries (5-1)/2 = 2 and (20-12)/2 = 4. Then add 4 to the min for the second partition?

1 1 1 2 2 2 2 | 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 | 20 20 20 20 20

Then (2-1)/2 = 0.5 , (5-3)/2 = 1, (15-12)/2 = 1

1 1 1 | 2 2 2 2 | 3 3 | 5 5 5 5 5 5 5 5 | 12 12 12 | 14 15 15 | 20 20 20 20

I would like to know if I'm doing this correctly. How do I determine the frequency and range?

Topic histogram data-mining

Category Data Science

Evan Gertis · Accepted Answer · 2021年10月16日 13:18

I've worked out an example using pen and paper. Essentially, select a Beta value/ Then we create a singleton bucket. Then we check if the value at the position is less then beta if it is then add it to the bucket. The range and frequency can be determined from the histogram i.e. look at x=1, y=3. We end up with a range of 3 and frequency of 1.

MaxDiff Histogram usage

About