How to split train/test in recommender systems

Question

How to split train/test in recommender systems

jamesmf

2022年5月31日 04:22

I am working with the MovieLens10M dataset, predicting user ratings. If I want to fairly evaluate my algorithm, how should I split my training v. test data?

By default, I believe the data is split into train v. test sets where 'test' contains movies previously unseen in the training set. If my model requires each movie to have been seen at least once in the training set, how should I split my data? Should I take all but N of each user's ratings for all the data and evaluate my performance on the held out NxUser_num ratings?

Topic dataset recommender-system machine-learning

Category Data Science

AN6U5 · Accepted Answer · 2020年8月28日 06:54

Leave-one-out cross validation is probably the most straight-forward way to address this. If you happen to be using a model that requires a lot of time to train, then leave n% out might be more appropriate.

The method involves eliminating one known rating and trying to predict it. If you want to remove n percent of the ratings, just choose them randomly rather than choosing a specific number of every user's ratings. And keep n pretty small - on the order of 10% or less.

Here is a good treatment of cross validation methods for recommender systems, An Evaluation Methodology for Collaborative Recommender Systems. Generally:

Holdout is a method that splits a dataset into two parts: a training set and a test set. These sets could have different proportions. In the setting of recommender systems the partitioning is performed by randomly selecting some ratings from all (or some of) the users. The selected ratings constitute the test set, while the remaining ones are the training set. This method is also called leave-k-out. In [17], Sarwar et al. split the dataset into 80% training and 20% test data. In [18] several ratios among training and test (from 0.2 to 0.95 with an increment of 0.05) are chosen and for each one the experiment is repeated ten times with different training and test sets and finally the results are averaged. In [13] the test set is made by 10% of users: 5 ratings for each user in the test set are withheld.

Leave-one-out is a method obtained by setting k = 1 in the leave-k-out method. Given an active user, we withhold in turn one rated item. The learning algorithm is trained on the remaining data. The withheld element is used to evaluate the correctness of the prediction and the results of all evaluations are averaged in order to compute the final quality estimate. This method has some disadvantages, such as the overfitting and the high computational complexity. This technique is suitable to evaluate the recommending quality of the model for users who are already registered as members of the system. Karypis et al. [10] adopted a trivial version of the leave-one-out creating the test set by randomly selecting one of the non-zero entries for each user and the remaining entries for training. In [7], Breese et al. split the URM in training and test set and then, in the test set, withhold a single randomly selected rating for each user.

A simple variant of the holdout method is the m-fold cross-validation. It consists in dividing the dataset into m independent folds (so that folds do not overlap). In turn, each fold is used exactly once as test set and the remaining folds are used for training the model. According to [20] and [11], the suggested number of folds is 10. This technique is suitable to evaluate the recommending capability of the model when new users (i.e., users do not already belong to the model) join the system. By choosing a reasonable number of folds we can compute mean, variance and confidence interval.

Hope this helps!

How to split train/test in recommender systems

About