Interpretation of Similarity Number generated by LogLikehood in Mahout

I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation.

It is a typical order recommendation system.

I have a database with around 699,445 orders. These orders have items which were “purchased”.

I ran the following mahout command:

mahout itemsimilarity --input /mnt/p1.csv --output ./output --similarityClassname SIMILARITY_LOGLIKELIHOOD --booleanData TRUE --threshold 0.9

I decided to spot-check the results.

I took the following line from the output file:

58331   120216  0.9705375406679205

In my input file:

1540 orders  have product 58331
35 orders have product 120216
10 orders have both (58331 and 120216)

Putting in Ted Dunning term’s

k_11 = 10   b_12 = 25
k_21 = 1531 k_22= 697889

The similarity number generated by the loglikehood algorithm between 58331 and 120216 is 0.9705375406679205.

1) What does that mean?

2) Should I recommend 58331 when someone orders 120216 ? Should I recommend 120216 when someone orders 58331?

3) How do I calculate the entropy used in the LLR formula?

Thx a lot

Topic apache-mahout recommender-system

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.