Item based recommender using SVD
I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger):
1.00 0.88 0.96 0.99
0.88 1.00 0.99 0.96
0.96 0.99 1.00 0.86
0.99 0.96 0.86 1.00
I need to implement recommender which, for a set of items, recommends a new set of items.
I was thinking about using SVD to reduce the items to n-dimensional space, let's say 50-dimensional space, so each item is represented with a vector 50 numbers, and similarity between two items is calculated by cosine similarity between two 50-dimensional vectors.
For a base set of items (which can get quite big), I hope I could calculate an average of their vectors, and use it for search.
Is this a good idea? What is this procedure called? And can it be done in Mahout?
This is my code so far:
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
Matrix m = new DenseMatrix(NUM_ITEMS, NUM_ITEMS);
// copy similarities to a matrix
for (int i = 0; i NUM_ITEMS; i++) {
double[] similar = similarity.itemSimilarities(i, range(NUM_ITEMS));
for (int j = 0; j NUM_ITEMS; j++) {
m.setQuick(i, j, similar[j]);
Matrix v = new SingularValueDecomposition(m).getV();
Matrix reduced = v.viewPart(0, NUM_ITEMS, 0, 50);
The problem is, SVD is taking forever for NUM_ITEMS > 30. I don't know if there is an issue with data, or with SVD implementation I'm using. The matrix m is symmetrical, could that be an issue? I tried googling "demean matrix mahout" with no results. How should I preprocess it for SVD to work faster? I will need NUM_ITEMS to be about 20.000 - 40.000 in the future. Is this reasonable size for SVD?
The problem was the matrix contained a few NaN values, that's why SVD was taking infinite time. After replacing these with 0.0 it works fine for 1000 x 1000 matrix. And my recommendations are working like a charm. I'll still need compute SVD of 20x more rows and columns. If anyone knows what's the easiest way to compute (approximate) SVD of 20.000 x 20.000 dense matrix, probably through some cloud parallel service (?), please let me know.
PS. Thanks for help!
Topic apache-mahout recommender-system
Category Data Science