Item Based Collaborative Filtering with No Ratings
I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited.
Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data.
The first solution I came up with was to use a graph database and write a query which does the following:
For each page we want recommendations for, we search for all the users who have viewed that page. Then, for each of those users, we look up all other pages they have viewed. We then count the number of users which have viewed each page in this data set, and use those with the highest count as our recommendations.
While this works pretty well, our data set has grown substantially and scaling the graph database is difficult. The queries become slower as the number of page views in our data set increases. We would like to consider a different implementation before we commit to moving to a distributed graph database.
In a more traditional item-based recommender (like Mahout's), is there a good way to 'fake' the ranking data, or is there a popular open source implementation which does not requires the ranking data?
Topic apache-mahout recommender-system open-source
Category Data Science