Interpretation of Similarity Number generated by LogLikehood in Mahout
I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation.
It is a typical order recommendation system.
I have a database with around 699,445 orders. These orders have items which were “purchased”.
I ran the following mahout command:
mahout itemsimilarity --input /mnt/p1.csv --output ./output --similarityClassname SIMILARITY_LOGLIKELIHOOD --booleanData TRUE --threshold 0.9
I decided to spot-check the results.
I took the following line from the output file:
58331 120216 0.9705375406679205
In my input file:
1540 orders have product 58331
35 orders have product 120216
10 orders have both (58331 and 120216)
Putting in Ted Dunning term’s
k_11 = 10 b_12 = 25
k_21 = 1531 k_22= 697889
The similarity number generated by the loglikehood algorithm between 58331 and 120216 is 0.9705375406679205.
1) What does that mean?
2) Should I recommend 58331 when someone orders 120216 ? Should I recommend 120216 when someone orders 58331?
3) How do I calculate the entropy used in the LLR formula?
Thx a lot
Topic apache-mahout recommender-system
Category Data Science