MaxDiff Histogram usage

I would like to understand how the MaxDiff algorithm works. The only example that I could find online was a set of slides by Chris Crifton.

Given An attribute C 12, 5, 5, 5, 12, 1, 1, 20, 20, 2, 2, 12, 20, 14, 20, 5, 5, 20, 15, 15, 2, 2, 1, 3, 3,5, 5, 5 I'd like to Identify the range and frequency of each bucket creating a Max-Diff histogram for C using beta=7.

I would be able to figure this out if there were more examples online. Does anyone know where I could find an example? I would appreciate help trying to solve this. The first step is sorting the data:

1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 12 12 12 14 15 15 20 20 20 20 20

The bucket boundary is (max-min)/2 = (20-1)/2 = (19)/2 = 9.5

1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 20 20 20 20 20

Would I calculate the next bin boundaries (5-1)/2 = 2 and (20-12)/2 = 4. Then add 4 to the min for the second partition?

1 1 1 2 2 2 2 | 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 | 20 20 20 20 20

Then (2-1)/2 = 0.5 , (5-3)/2 = 1, (15-12)/2 = 1

1 1 1 | 2 2 2 2 | 3 3 | 5 5 5 5 5 5 5 5 | 12 12 12 | 14 15 15 | 20 20 20 20

I would like to know if I'm doing this correctly. How do I determine the frequency and range?

Topic histogram data-mining

Category Data Science


I've worked out an example using pen and paper. Essentially, select a Beta value/ Then we create a singleton bucket. Then we check if the value at the position is less then beta if it is then add it to the bucket. The range and frequency can be determined from the histogram i.e. look at x=1, y=3. We end up with a range of 3 and frequency of 1.

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.