I am trying to build an item-item similarity matching recommendation engine with mahout. The data set is as in the following format ( attributes are in text not in numerals format ) name : category : cost : ingredients x : xx1 : 15 : xxx1, xxx2, xxx3 y : yy1 : 14 : yyy1, yyy2, yyy3 z : xx1 : 12 : xxx1, xxy1 So in-order to use this data set for mahout to train, what is the right …
Which all are the equivalent or advanced libraries in Python for building recommendation systems like Mahout for Collaborative Filtering and Content Based Filtering ? Also is there a way to integrate Mahout with Python?
I read about https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html but couldn't find a spark library for this implementation. I have columnar string dataset. I have a dataset with around data of 15-20 million users with their show_watched, times_watched, genre, channel and some more columns, I need to calculate lookalike/s for a user(or 100k users). How do I find lookalikes for them within less time, I have tried by indexing data in Solr, and then using Solr MLT for finding similar users, but that takes a …
I installed Hadoop, Mahout and Spark. I am able to see the Hadoop and Spark MasterWebUI. Moreover, I can also run the following command, [hadoop@muildevcel01 mahout]$ bin/mahout However, we I try running the spark-shell I run in the problem stated below, [hadoop@muildevcel01 mahout]$ bin/mahout spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more Question Any suggestions how I could resolve my problem?
I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data. …
I have cloudera CDH5 running inside a virtual box. when I try to run : mahout spark-itemsimilarity .... I get the error: Unknown program 'spark-itemsimilarity' chosen. Do i have to install any additional package to run the spark-similarity? Any help would be appreciated !
I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation. It is a typical order recommendation system. I have a database with around 699,445 orders. These orders have items which were “purchased”. I ran the following mahout command: mahout itemsimilarity --input /mnt/p1.csv --output ./output --similarityClassname SIMILARITY_LOGLIKELIHOOD --booleanData TRUE --threshold 0.9 I decided to spot-check the results. I …
I apologize that this has been asked and I feel that it may be obvious, but I am wondering exactly what the meaning of the numerical value below from clusterdump: Top Terms: monkey => 0.8170868432876803 I believe that to the be center of the centroid. But if the term vectors were created with term frequencies, could one interpret this as the average occurrence of the "monkey" in the documents that are considered part of the cluster? In this case, "monkey" …
Is it possible to get recommendation on similar product using Mahout ? eg : I have data set of movies with following attributes Movie_name, Actor_1, Actor_2, Actress_1, Actress_2, Director, Theme, Language Now given a Movie_name the system should recommend top 3 similar movies based on the attributes . Can this be done using Mahout. If yes how ?
I have a data set, in excel format, with account names, reported symptoms, a determined root cause and a date in month year format for each row. I am trying to implement a mahout like system with a purpose of determining the likelihood symptoms an account can report by doing a user based similarity kind of a thing. Technically, I am just hoping to tweak the recommendation system into a deterministic system to spot out the probable symptoms an account …
What are the advantages and disadvantages of using Collaborative filtering based recommendation using machine learning approach and graph based approach ? Say I have user purchase data (user_name, user_location, user_company_name, product_name, product_price, product_ingredients) and would like to recommend product for user based on what other user from the same location, company are buying, based on product price, ingredients etc. How to decide on which of them is suitable for a given use case? I would like to evaluate Neo4j (Graph …
Can anyone tell me where do I find any documentation for parameters like: -stepOffset -alpha -decayExponent in an OnlineLogisticRegression function in Mahout? I am interested in what do they change in calls like this one: int FEATURES = 10000; OnlineLogisticRegression learningAlgorithm = new OnlineLogisticRegression(20, FEATURES, new L1()) .alpha(1).stepOffset(1000).decayExponent(0.9).lambda(3.0e-5).learningRate(20);
I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big enough). In the first table there are users and their actions. Every user has at least one action and they can have too many actions, too. About 10000 different user_actions and millions of …
I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger): 1.00 0.88 0.96 0.99 0.88 1.00 0.99 0.96 0.96 0.99 1.00 0.86 0.99 0.96 0.86 1.00 I need to implement recommender which, for a set of items, recommends a new set of items. I was thinking about using SVD to reduce the items to n-dimensional space, let's say 50-dimensional space, so each item is represented with a vector 50 numbers, and similarity between two items is …