Recommend products based on historical queries of other users

Given the user data as in the following:

   user   query       date
0  jack   mango 2020-01-03
1  jack  banana 2020-01-04
2  jack   apple 2020-02-03
3  jack  orange 2020-03-03
4  john    meat 2020-07-03
5  john   water 2020-07-03

Now assume we have a new user enter mango, I am finding a good way to recommend user product.

One approach is the following based on item2vec:

import pandas as pd
df_user= pd.DataFrame( {'user':['jack','jack','jack','jack','john','john'],'query':['mango','banana', 'apple','orange','meat', 'water'],'date':['2020-1-3','2020-1-4','2020-2-3','2020-3-3','2020-7-3','2020-7-3']})
df_user['date']=pd.to_datetime(df_user['date'])

new_query='mango'

from gensim.models import Word2Vec

model = Word2Vec(sentences = df_user.groupby(['user'], as_index=False).agg(list)['query'], window = 9999999, min_count=1)
model.wv.most_similar(new_query, topn=10) 

Strangely, it gives

[('banana', 0.09904204308986664),
 ('orange', 0.004004828631877899),
 ('water', -0.022172965109348297),
 ('meat', -0.05908803641796112),
 ('apple', -0.1611100435256958)]

as output, where 'water' and 'meat' ranked above 'apple',

  1. Is there any problem in my implementation?
  2. Is there other good way to solve this problem instead of item2vec?

Topic gensim word2vec

Category Data Science


This seems like a strange way to use Word2Vec imho.

A simple approach:

  1. Calculate the PMI of every distinct value with each other.
  2. For any new query, pick the value which has the highest PMI with it.

In case you just want the most frequently associated, you can use the conditional probability instead of PMI.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.