Item based recommender using SVD

I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger):

1.00 0.88 0.96 0.99 
0.88 1.00 0.99 0.96 
0.96 0.99 1.00 0.86 
0.99 0.96 0.86 1.00 

I need to implement recommender which, for a set of items, recommends a new set of items.

I was thinking about using SVD to reduce the items to n-dimensional space, let's say 50-dimensional space, so each item is represented with a vector 50 numbers, and similarity between two items is calculated by cosine similarity between two 50-dimensional vectors.

For a base set of items (which can get quite big), I hope I could calculate an average of their vectors, and use it for search.

Is this a good idea? What is this procedure called? And can it be done in Mahout?


EDIT:

This is my code so far:

ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
Matrix m = new DenseMatrix(NUM_ITEMS, NUM_ITEMS);
// copy similarities to a matrix
for (int i = 0; i  NUM_ITEMS; i++) {
        double[] similar = similarity.itemSimilarities(i, range(NUM_ITEMS));
        for (int j = 0; j  NUM_ITEMS; j++) {
            m.setQuick(i, j, similar[j]);
    }
}
Matrix v = new SingularValueDecomposition(m).getV();
Matrix reduced = v.viewPart(0, NUM_ITEMS, 0, 50);

The problem is, SVD is taking forever for NUM_ITEMS > 30. I don't know if there is an issue with data, or with SVD implementation I'm using. The matrix m is symmetrical, could that be an issue? I tried googling "demean matrix mahout" with no results. How should I preprocess it for SVD to work faster? I will need NUM_ITEMS to be about 20.000 - 40.000 in the future. Is this reasonable size for SVD?


EDIT 2:

The problem was the matrix contained a few NaN values, that's why SVD was taking infinite time. After replacing these with 0.0 it works fine for 1000 x 1000 matrix. And my recommendations are working like a charm. I'll still need compute SVD of 20x more rows and columns. If anyone knows what's the easiest way to compute (approximate) SVD of 20.000 x 20.000 dense matrix, probably through some cloud parallel service (?), please let me know.

PS. Thanks for help!

Topic apache-mahout recommender-system

Category Data Science


Now you face the problem of scalability.

Have you ever heard of random projection? Here is my suggestion:

let's say $S$ is your item-item relation matrix, its dimension is too large to decompose, now you can multiply an smaller matrix $U$(randomly generated) and get $R = SU$. And $R$ is small also.

Then just do SVD of $R$, if $U$ is invertible, and the difference (l2-norm or any norm you like) between $R$ and $S$ is small, it might helps.

As for the second issue:

NaN value

If it's dense, try to complete the matrix(using mean?).


I need to implement recommender which, for a set of items, recommends a new set of items.

Is this a good idea?

Have you looked into Association Rules Mining? If you're open to other procedures this one is first one that came to my mind for recommenders based on sets of items. For those not familiar, this is simple method for retail shops to determine "75% of customers who bought A, B also bought C". Within these algos, the Apriori algo is straight-forward, easy to implement, and may get you what you need.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.