Content based recommendation on Mahout

Is it possible to get recommendation on similar product using Mahout ?

eg :

I have data set of movies with following attributes

Movie_name, Actor_1, Actor_2, Actress_1, Actress_2, Director, Theme, Language

Now given a Movie_name the system should recommend top 3 similar movies based on the attributes .

Can this be done using Mahout. If yes how ?

Topic apache-mahout python recommender-system

Category Data Science


Generally, this is done using spark-rowsimilarity algorithm - it is a class of content based recommendation. However, the actual process of doing this is quite simple. Here are the steps:

  1. For each movie, convert your categorical variables into columns. For lets say that actor_1 has Brad Pitt, Daniel Craig, and Vin Diesel for different movies. This will become three columns with a 1 denoting which movies have each actor. Your movie matrix will look something like:

    Movie Name, Has_Brad_Pitt, Has_Daniel_Craig, Has_Vin_Diesel, ...
    MI-6      ,     1        ,       0         ,     0         , ...
    Fast&Furios,    0        ,       0         ,     1         , ...
    Casino Royale,  0        ,       1         ,     0         , ...
    
  2. Now, to find similarity score of movies, you can just compute the cross product of the two vectors. Higher the value, more they are similar.

This can be done by the spark-rowsimilarity algorithm in one shot. You may have to do some work in encoding categorical variables.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.