Creating Data model for mahout recommendation engine

I am trying to build an item-item similarity matching recommendation engine with mahout. The data set is as in the following format ( attributes are in text not in numerals format ) name : category : cost : ingredients x : xx1 : 15 : xxx1, xxx2, xxx3 y : yy1 : 14 : yyy1, yyy2, yyy3 z : xx1 : 12 : xxx1, xxy1 So in-order to use this data set for mahout to train, what is the right …
Category: Data Science

Using Spark for finding similar users to a user?

I read about https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html but couldn't find a spark library for this implementation. I have columnar string dataset. I have a dataset with around data of 15-20 million users with their show_watched, times_watched, genre, channel and some more columns, I need to calculate lookalike/s for a user(or 100k users). How do I find lookalikes for them within less time, I have tried by indexing data in Solr, and then using Solr MLT for finding similar users, but that takes a …
Category: Data Science

Mahout Spark shell not working

I installed Hadoop, Mahout and Spark. I am able to see the Hadoop and Spark MasterWebUI. Moreover, I can also run the following command, [hadoop@muildevcel01 mahout]$ bin/mahout However, we I try running the spark-shell I run in the problem stated below, [hadoop@muildevcel01 mahout]$ bin/mahout spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more Question Any suggestions how I could resolve my problem?
Category: Data Science

Recommendation for boolean dataset with apache mahout

I was trying to implement Item based Recommender System with the boolean dataset, Dataset example: User-id | movie-id | Action | Comedy | Drama 1 200 0 1 1 2 210 1 1 0 And I tried implementing it with item-based similarity algorithm as follows: package prediction.contentrecommender; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.similarity.ItemSimilarity; import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender; public class ContentRecommender { /** * @param args */ public static …
Category: Data Science

Item Based Collaborative Filtering with No Ratings

I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data. …
Category: Data Science

Interpretation of Similarity Number generated by LogLikehood in Mahout

I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation. It is a typical order recommendation system. I have a database with around 699,445 orders. These orders have items which were “purchased”. I ran the following mahout command: mahout itemsimilarity --input /mnt/p1.csv --output ./output --similarityClassname SIMILARITY_LOGLIKELIHOOD --booleanData TRUE --threshold 0.9 I decided to spot-check the results. I …
Category: Data Science

mahout clusterdump top terms meaning

I apologize that this has been asked and I feel that it may be obvious, but I am wondering exactly what the meaning of the numerical value below from clusterdump: Top Terms: monkey => 0.8170868432876803 I believe that to the be center of the centroid. But if the term vectors were created with term frequencies, could one interpret this as the average occurrence of the "monkey" in the documents that are considered part of the cluster? In this case, "monkey" …
Category: Data Science

Content based recommendation on Mahout

Is it possible to get recommendation on similar product using Mahout ? eg : I have data set of movies with following attributes Movie_name, Actor_1, Actor_2, Actress_1, Actress_2, Director, Theme, Language Now given a Movie_name the system should recommend top 3 similar movies based on the attributes . Can this be done using Mahout. If yes how ?
Category: Data Science

Mimic a Mahout like system

I have a data set, in excel format, with account names, reported symptoms, a determined root cause and a date in month year format for each row. I am trying to implement a mahout like system with a purpose of determining the likelihood symptoms an account can report by doing a user based similarity kind of a thing. Technically, I am just hoping to tweak the recommendation system into a deterministic system to spot out the probable symptoms an account …
Category: Data Science

collaborative filtering using graph and machine learning

What are the advantages and disadvantages of using Collaborative filtering based recommendation using machine learning approach and graph based approach ? Say I have user purchase data (user_name, user_location, user_company_name, product_name, product_price, product_ingredients) and would like to recommend product for user based on what other user from the same location, company are buying, based on product price, ingredients etc. How to decide on which of them is suitable for a given use case? I would like to evaluate Neo4j (Graph …
Category: Data Science

Parameters for OnlineLogisticRegression function in Mahout

Can anyone tell me where do I find any documentation for parameters like: -stepOffset -alpha -decayExponent in an OnlineLogisticRegression function in Mahout? I am interested in what do they change in calls like this one: int FEATURES = 10000; OnlineLogisticRegression learningAlgorithm = new OnlineLogisticRegression(20, FEATURES, new L1()) .alpha(1).stepOffset(1000).decayExponent(0.9).lambda(3.0e-5).learningRate(20);
Category: Data Science

User profiling with Mahout from categorized user behavior

I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big enough). In the first table there are users and their actions. Every user has at least one action and they can have too many actions, too. About 10000 different user_actions and millions of …
Category: Data Science

Item based recommender using SVD

I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger): 1.00 0.88 0.96 0.99 0.88 1.00 0.99 0.96 0.96 0.99 1.00 0.86 0.99 0.96 0.86 1.00 I need to implement recommender which, for a set of items, recommends a new set of items. I was thinking about using SVD to reduce the items to n-dimensional space, let's say 50-dimensional space, so each item is represented with a vector 50 numbers, and similarity between two items is …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.