MaxDiff Histogram usage
I would like to understand how the MaxDiff algorithm works. The only example that I could find online was a set of slides by Chris Crifton.
Given An attribute C 12, 5, 5, 5, 12, 1, 1, 20, 20, 2, 2, 12, 20, 14, 20, 5, 5, 20, 15, 15, 2, 2, 1, 3, 3,5, 5, 5 I'd like to Identify the range and frequency of each bucket creating a Max-Diff histogram for C using beta=7.
I would be able to figure this out if there were more examples online. Does anyone know where I could find an example? I would appreciate help trying to solve this. The first step is sorting the data:
1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 12 12 12 14 15 15 20 20 20 20 20
The bucket boundary is (max-min)/2 = (20-1)/2 = (19)/2 = 9.5
1 1 1 2 2 2 2 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 20 20 20 20 20
Would I calculate the next bin boundaries (5-1)/2 = 2 and (20-12)/2 = 4. Then add 4 to the min for the second partition?
1 1 1 2 2 2 2 | 3 3 5 5 5 5 5 5 5 5 | 12 12 12 14 15 15 | 20 20 20 20 20
Then (2-1)/2 = 0.5 , (5-3)/2 = 1, (15-12)/2 = 1
1 1 1 | 2 2 2 2 | 3 3 | 5 5 5 5 5 5 5 5 | 12 12 12 | 14 15 15 | 20 20 20 20
I would like to know if I'm doing this correctly. How do I determine the frequency and range?
Topic histogram data-mining
Category Data Science