In any commerce setting, the concept of item similarity is very not straightforward. Two users usually buying same kinds of products can be considered as similar, but we cannot say the same about two items bought by same user.
There are two different concepts of item similarity for recommendation purposes. One is, if the two items are physically similar, for example : Blue Reebok Shoes and Red Reebok Shoes, and other is if they have functional dependency on each other, for example: Reebok Shoes and Reebok Socks.
For finding physically similar items, one can create a dictionary of attributes defining the product and do Jaccard similarity on those attributes.
For example:
Item A = {color: Blue, size: 10, material: Cotton, brand: Reebok}
Item B = {color: Red, size: 10, material: Cotton, brand: Reebok}
Thus, the intersection of sets would be number of attributes that match up, i.e.
Intersection(A,B) = {size, material, brand}
Union(A,B) = {color, size, material, brand}
Jaccard Index = 3/4 = 0.75
For finding behaviorally dependent items, one proxy that is generally used is more the two items are bought together in same session, more dependent they are for each other's functioning, thus more valuable recommendation. For this setting, one can create a matrix of products a user buys in a single session. For m
users and n
products it would be sparse m X n
matrix. If we read the same matrix column wise, that would be set of users who bought the item in a particular session.
Thus,
Item A = {Ua, Ub, Uc}
Item B = {Ub, Ud}
Jaccard Index = 1/4 = 0.25