Music Recommander using Implicit Library

I want to build a music recommender predicting the number of times a user will listen to a song. I am using the Implicit library and following this close example : https://github.com/benfred/implicit/blob/main/examples/tutorial_lastfm.ipynb I wanted to know how can I predict the number of plays for a given user for a specific song, all I can see there and in the documentation is to recommend songs to a given user with scores of proximity but without giving the actual prediction
Category: Data Science

How do I perform Leave One Out Cross Validation For Top n Recommendation Sytems?

I am new in making recommendation systems . I am using the surpriselib library to evaluate my recommendations. All the Accuracy Metrics are well supported in this library. But I also want to compute the Hit Rate of my top n recommender system. I know the formula for hit rate is: (no items users have already purchased)/(no of users) But this does not makes sense to me because to train and test the user vs item ratings I have only …
Category: Data Science

How does ALS implementation calculate ratings when model.transform is called?

The spark ALS model is based on this paper: Collaborative Filtering for Implicit Feedback datasets. . Here, latent vectors are learnt such that instead of estimating R (ratings matrix), they only estimate P (preference matrix - binary matrix based on whether user has interacted with item or not). (R is broken down into P and C (confidence matrix). Question: As C is not estimated, how is it possible that model.transform(dataset) accurately predicts ratings R? The implementation is a dead end …
Category: Data Science

How to split train/test in recommender systems

I am working with the MovieLens10M dataset, predicting user ratings. If I want to fairly evaluate my algorithm, how should I split my training v. test data? By default, I believe the data is split into train v. test sets where 'test' contains movies previously unseen in the training set. If my model requires each movie to have been seen at least once in the training set, how should I split my data? Should I take all but N of …
Category: Data Science

When to stop showing content on recommendation engines?

Let's take an example. I log into my Netflix account and see that it's suggesting the show Friends to me. But I have no interest in watching Friends. So I ignore it. The next time I login, it suggests Friends again and I ignore again. This goes on for quite a few number of logins until Netflix finds content that is probably more relevant to me than Friends. My question is this: Is there a way to use the information …
Category: Data Science

Spark ALS-WR giving the same recommended items for all users

We are trying to build a recommendation system for a supermarket with diverse item types (ranging from fast-moving grocery to low-moving electronic items). Some items are purchased more frequently in high volume and some items are purchased only once. We have purchase history data of 4 months from 25K+ customers across 30K+ SKU's from 100+ departments. We ran ALS-WR in Spark to generate recommendations. To our surprise, we are receiving top 15 recommendations for each customer quite generic without much …
Category: Data Science

Can I get un-normalized vectors from the TF USE model?

I'm using this Universal Sentence Encoder (USE) model to get embeddings of a set of texts, each text corresponding to a newspaper article. In order to build a Recommender System, I generate user embeddings by averaging the embeddings of items a user has read, and then I look for other texts that are cosine-similar to this user (basically, the method returns a set of items that are similar to this user embedding). Now, the problem is that the mentioned model …
Category: Data Science

How is the input given to the NeuMF architecture?

I was going through this neural recommendation paper (Fig. 2). I want to implement it from scratch in Tensorflow. The thing I don't understand is how is the input given to this architecture. Can someone explain with a small example? If I am right the embedding latent factor has to be $M\times K$ and $N\times K$ after passing through the embedding layer? If I have $Users \times Items$ rating matrix- [[2, NaN, 4], [3, 1, NaN], [4, NaN, 5]] They …
Category: Data Science

How do I correctly build model on given data to predict target parameter?

I have some dataset which contains different paramteres and data.head() looks like this Applied some preprocessing and performed Feature ranking - dataset = pd.read_csv("ML.csv",header = 0) #Get dataset breif print(dataset.shape) print(dataset.isnull().sum()) #print(dataset.head()) #Data Pre-processing data = dataset.drop('organization_id',1) data = data.drop('status',1) data = data.drop('city',1) #Find median for features having NaN median_zip, median_role_id, median_specialty_id, median_latitude, median_longitude = \ data['zip'].median(),\ data['role_id'].median(),\ data['specialty_id'].median(),\ data['latitude'].median(),\ data['longitude'].median() data['zip'].fillna(median_zip, inplace=True) data['role_id'].fillna(median_role_id, inplace=True) data['specialty_id'].fillna(median_specialty_id, inplace=True) data['latitude'].fillna(median_latitude, inplace=True) data['longitude'].fillna(median_longitude, inplace=True) #Fill YearOFExp with 0 data['years_of_experience'].fillna(0, inplace=True) target = dataset.location_id …
Category: Data Science

How to update item and user factors ALS in Group Specific Recommendation?

I was going through this Group Specific Recommendation System paper. I want to implement this from scratch. I see that they have used Alternating Least Square. But how are they updating the item factors and user factors? Do I need to find the gradient of those equations (5), (6), (7), (8)? The algorithm I am talking about is in 3.2. Someone help me visualize this via an example of how is the calculation happening. Let me give a short example. …
Category: Data Science

Can a recommendation system be used as a binary classifier?

I have a computer-generated music project, and I'd like to classify short passages of music as "good" or "bad" via machine learning. I won't have a large training set. I'll start by generating 500 examples each of good and bad music, manually. These examples can be transposed and mirror-imaged to produce 12,000 examples of each good and bad. I have a way of extracting features from the music in an intelligent way that mimics the way a perceptive listener would …
Category: Data Science

How to combine recommended lists produced by two different models?

Suppose there are two algorithms that I use to generate recommendations for a user, the first one producing list A, the second one producing list B, both of length $k$. Is there a clever way of combining the two lists (to create a hybrid approach, so to say) into a final list C of the same length as A and B? I suppose that some scoring could be possible, but I'm not sure what to base the scores on. Another …
Category: Data Science

Building a content-based recommendation system using products' metadata as features?

I am currently working on an apparel recommendation system, where I have tabulated data containing a list of products with their respective metadata (brand, category, color etc.) I have an additional column of client ids to denote which client has bought which product. I want this content-based recommendation system to recommend a client a bunch of products, based on the metadata of the products they have purchased in the past. I am trying to find a way to learn user …
Category: Data Science

What ways can i find two similar sets of customers use KNN?

I have a study where i want to find users similar to a set of users (SEED). My data looks like a pivot by customer e.g. sample of SEED looks like (note i drop cust_id): cust_id | spend_food | spend_nike | spend_harrods | 1 | 145 | 45 | 32 | 2 | 85 | 89 | 0 | 4 | 23 | 67 | 1900 | 5 | 84 | 12 | 900 | So to find users similar …
Category: Data Science

Recommender system based on clusters

I'm wondering if this is a correct approach to build recommender systems: My problem: Recommend phone devices, you have device X and you are likely to switch to device Y. Understand the data. I want to know the implication of each dimension on the device switch. How should I do it? Correlation matrix? assign to each switch one ID and check de CM?. Per example, the switch may be different by country, etc. Once I know the implications of each …
Category: Data Science

What kind of statistical test can be performed in a recommender system dataset that predicts the ratings for the movies?

The dataset consists of 1000s of users and users and each row of the dataset consist of user_id,movie_id and ratings the user provides to the movie. eg. 1,56,5 In my experiment i am calculating the mse and precision using collabarative filtering model. The error comes from difference in predicted and actual ratings. I want to conduct a statistical test now. Which statistical model is to performed and how? Thanks in advance.
Category: Data Science

Song playlist recommendation system

I want to build a recommender system to suggest similar songs to continue a playlist (similar to what Spotify does by recommending similar songs at the end of a playlist). I want to build two models: one based on collaborative filtering and another one, a content-based model, to compare their results and choose the best one. Now, I have two questions: Where can I find a dataset with useful data for this type of work? How can I measure the …
Category: Data Science

Which metrics for evaluating a recommender system with implicit data?

I am currently in the process of creating a recommender system. This recommender system works with a neural network and then searches for the closest neighbors and thus gives recommendations for a user. The data is implicit. I only have in the data which products a user has bought.On the basis of this data, I create the recommendations. What are the best metrics to evaluate this recommender system with implicit data? Can I evaluate the model and then the search …
Category: Data Science

A weird result from a recommender system

Say there're the top 10 most popular items among 100 sales products and about 100k users regularly purchase items on daily basis. A = has been purchased by 100k users. B = has been purchased by 30k users. C = has been purchased by 20k users. D = has been purchased by 18k users. E = has been purchased by 10k users. F = has been purchased by 8k users. G = has been purchased by 7k users. H = …
Category: Data Science

Suggestion for Recommender system algorithm for 3 sets of entities

I am building a model to recommend logistic providers to merchants on an e-commerce platform. There are approx. 100k merchants and 20 logistic providers, so scaling is not very important here. Currently, everytime a customer makes a purchase from a merchant, the merchant will need to select a merchant from the list of 20 to ship the goods to the user. Some merchants always choose the cheapest shipping options, while some have their own personal preferences. The location of the …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.