How to model a supervised recommender system with varying data

Suppose there are 2000 movies and a company wants to recommend some movies (for example, at most 5 movies) to each visitor. The objective is to learn how to predict which movie will be selected if a specific set of movies is recommended.

   option-1  option-2  option-3  option-4  option-5  Selected-Movie
1. movie1    movie3    movie4                        movie4 
2. movie3    movie4    movie100  movie1000 movie1001 movie1001
3. movie4    movie5    movie34                       movie34

Based on this data set, I want to learn when sample 1 is suggested to a customer, he will visit movie4. Because the number of features can be so high (here 2000 movies), I think it would not be a good option to use on-hot-encoding. Think at most 5 movies can be recommended, I thought it might be a good option to consider a vector with size 5 and if the number of recommended movies is less than 5, blanks will be replaced with 0. However, in this situation, the perturbation of movies will be important. For example, (1,2,3,4,5) will be different from (2,1,3,4,5) and I want to consider both cases the same. In other words, all 5! perturbations should be the same and there is no difference between them. Moreover, with this representation of data, I think it will not be possible to use Decision Tree whereas some algorithms like Catboost works.

My preference are algorithms that can generate rules like Decision Tree. I would be thankful if you have any recommendation for data representation and how features should be considered.

Topic feature-engineering feature-construction decision-trees recommender-system

Category Data Science


In this task you're missing something: you don't have any features to represent a specific visitor.

This means that the best movie that your model can predict is the one which is selected the most often by any visitor. As a consequence, the only thing that the model can learn from such a dataset is to associate every possible sequence with the most frequently selected movie given this sequence. Of course it would have to generalize a bit for sequences which don't appear in the training data, but that's the most ML there is in this task. In theory the task can be done just by counting the joint frequency sequence+selected-movie in order to calculate the conditional probability selected-movie given sequence.


[edit]

The standard way to represent a set would be one hot encoding. It's doable even with 2000 features, assuming there are enough instances in the data. In this basic option you could certainly reduce the number of features by removing the movies which are never/rarely proposed or never/rarely selected. This would be unlikely to achieve performance, because if there are no or few cases where a movie is proposed/selected, then the model cannot use this information (or if it does it's going to cause overfitting).

I could think of a couple alternative approaches:

  • Apply clustering on the sequences of movies as OHE (without the target variable), then train a model with for every cluster: this way there are less features to deal with for every model.

  • Just rank all the movies by order of preference based on how often the movie is selected. Given that there is no user preference, it can be assumed that there is a total order on the movies:

  • the top preferred movie is always selected, whatever the other movies proposed

  • the 2nd top preferred movie is always selected, unless the first one is proposed as well

  • ... the $n^{th}$ top is always selected, unless another one higher in the top is proposed as well

To predict the selected movie, just pick the highest ranked among the 3-5 proposed.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.