Train-Test split for a recommender system
In all implementations of recommender systems I've seen so far, the train-test split is performed in this manner:
+------+------+--------+
| user | item | rating |
+------+------+--------+
| u1 | i1 | 2.3 |
| u2 | i2 | 5.3 |
| u1 | i4 | 1.0 |
| u3 | i5 | 1.6 |
| ... | ... | ... |
+------+------+--------+
This is transformed into a rating matrix of the form:
+------+-------+-------+-------+-------+-------+-----+
| user | item1 | item2 | item3 | item4 | item5 | ... |
+------+-------+-------+-------+-------+-------+-----+
| u1 | 2.3 | 1.7 | 0.5 | 1.0 | NaN | ... |
| u2 | NaN | 5.3 | 1.0 | 0.2 | 4.3 | ... |
| u3 | NaN | NaN | 2.1 | 1.3 | 1.6 | ... |
| ... | ... | ... | ... | ... | ... | ... |
+------+-------+-------+-------+-------+-------+-----+
where NaN
corresponds to the situation where a user has not rated that particular item.
Now, from each row (user) of the matrix, a certain percentage of the numeric (non-NaN) values are removed and set aside into a new matrix, representing the test set. The model is then trained on the initial matrix, with test samples removed, and the goal of the recommender is to fill-in the missing values, with the smallest possible error.
My question is, can the train-test split be somehow done user-wise? For example to keep a set of users separate, train the recommender on the rest of the user set and then try to predict the ratings for the new users? I know this goes a bit against the idea that "if a recommender does not know you, it cannot recommend something you like", but I am wondering if some k-NN can be done.
Topic software-recommendation dataset recommender-system
Category Data Science