apache-mahout

Creating Data model for mahout recommendation engine

Sreejithc321

2022年4月18日 18:15

I am trying to build an item-item similarity matching recommendation engine with mahout. The data set is as in the following format ( attributes are in text not in numerals format ) name : category : cost : ingredients x : xx1 : 15 : xxx1, xxx2, xxx3 y : yy1 : 14 : yyy1, yyy2, yyy3 z : xx1 : 12 : xxx1, xxy1 So in-order to use this data set for mahout to train, what is the right …

Topic: apache-mahout dataset recommender-system data-mining machine-learning

Category: Data Science

Building Recommendation engine with Python

Sreejithc321

2020年8月3日 10:27

Which all are the equivalent or advanced libraries in Python for building recommendation systems like Mahout for Collaborative Filtering and Content Based Filtering ? Also is there a way to integrate Mahout with Python?

Topic: apache-mahout software-recommendation python

Category: Data Science

Using Spark for finding similar users to a user?

Nikhil Verma

2018年1月29日 16:40

I read about https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html but couldn't find a spark library for this implementation. I have columnar string dataset. I have a dataset with around data of 15-20 million users with their show_watched, times_watched, genre, channel and some more columns, I need to calculate lookalike/s for a user(or 100k users). How do I find lookalikes for them within less time, I have tried by indexing data in Solr, and then using Solr MLT for finding similar users, but that takes a …

Topic: similar-documents apache-mahout apache-spark

Category: Data Science

Mahout Spark shell not working

Dimag Kharab

2017年7月16日 04:47

I installed Hadoop, Mahout and Spark. I am able to see the Hadoop and Spark MasterWebUI. Moreover, I can also run the following command, [hadoop@muildevcel01 mahout]$ bin/mahout However, we I try running the spark-shell I run in the problem stated below, [hadoop@muildevcel01 mahout]$ bin/mahout spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more Question Any suggestions how I could resolve my problem?

Topic: apache-mahout apache-spark apache-hadoop

Category: Data Science

Recommendation for boolean dataset with apache mahout

pre

2017年4月7日 12:52

I was trying to implement Item based Recommender System with the boolean dataset, Dataset example: User-id | movie-id | Action | Comedy | Drama 1 200 0 1 1 2 210 1 1 0 And I tried implementing it with item-based similarity algorithm as follows: package prediction.contentrecommender; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.similarity.ItemSimilarity; import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender; public class ContentRecommender { /** * @param args */ public static …

Topic: apache-mahout recommender-system machine-learning

Category: Data Science

Item Based Collaborative Filtering with No Ratings

sheldonkreger

2016年5月31日 16:18

I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data. …

Topic: apache-mahout recommender-system open-source

Category: Data Science

Unknown program 'spark-itemsimilarity' chosen

CrazyBrazilian

2016年5月19日 00:22

I have cloudera CDH5 running inside a virtual box. when I try to run : mahout spark-itemsimilarity .... I get the error: Unknown program 'spark-itemsimilarity' chosen. Do i have to install any additional package to run the spark-similarity? Any help would be appreciated !

Topic: apache-mahout apache-spark recommender-system

Category: Data Science

Interpretation of Similarity Number generated by LogLikehood in Mahout

CrazyBrazilian

2016年5月10日 19:35

I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation. It is a typical order recommendation system. I have a database with around 699,445 orders. These orders have items which were “purchased”. I ran the following mahout command: mahout itemsimilarity --input /mnt/p1.csv --output ./output --similarityClassname SIMILARITY_LOGLIKELIHOOD --booleanData TRUE --threshold 0.9 I decided to spot-check the results. I …

Topic: apache-mahout recommender-system

Category: Data Science

mahout clusterdump top terms meaning

Chris

2016年4月1日 13:07

I apologize that this has been asked and I feel that it may be obvious, but I am wondering exactly what the meaning of the numerical value below from clusterdump: Top Terms: monkey => 0.8170868432876803 I believe that to the be center of the centroid. But if the term vectors were created with term frequencies, could one interpret this as the average occurrence of the "monkey" in the documents that are considered part of the cluster? In this case, "monkey" …

Topic: apache-mahout k-means

Category: Data Science

N - fold cross validation in mahout

Sreejithc321

2016年1月29日 06:16

Is there a method/class available in Apache Mahout to perform n-fold cross validation? If yes how it can be done?

Topic: apache-mahout java data-mining machine-learning

Category: Data Science

Content based recommendation on Mahout

Sreejithc321

2015年12月21日 21:46

Is it possible to get recommendation on similar product using Mahout ? eg : I have data set of movies with following attributes Movie_name, Actor_1, Actor_2, Actress_1, Actress_2, Director, Theme, Language Now given a Movie_name the system should recommend top 3 similar movies based on the attributes . Can this be done using Mahout. If yes how ?

Topic: apache-mahout python recommender-system

Category: Data Science

Mimic a Mahout like system

SRS

2015年11月20日 01:08

I have a data set, in excel format, with account names, reported symptoms, a determined root cause and a date in month year format for each row. I am trying to implement a mahout like system with a purpose of determining the likelihood symptoms an account can report by doing a user based similarity kind of a thing. Technically, I am just hoping to tweak the recommendation system into a deterministic system to spot out the probable symptoms an account …

Topic: apache-mahout similarity recommender-system

Category: Data Science

collaborative filtering using graph and machine learning

Sreejithc321

2015年11月4日 14:46

What are the advantages and disadvantages of using Collaborative filtering based recommendation using machine learning approach and graph based approach ? Say I have user purchase data (user_name, user_location, user_company_name, product_name, product_price, product_ingredients) and would like to recommend product for user based on what other user from the same location, company are buying, based on product price, ingredients etc. How to decide on which of them is suitable for a given use case? I would like to evaluate Neo4j (Graph …

Topic: apache-mahout graphs neo4j machine-learning

Category: Data Science

Parameters for OnlineLogisticRegression function in Mahout

Marcin Kosiński

2015年7月9日 09:50

Can anyone tell me where do I find any documentation for parameters like: -stepOffset -alpha -decayExponent in an OnlineLogisticRegression function in Mahout? I am interested in what do they change in calls like this one: int FEATURES = 10000; OnlineLogisticRegression learningAlgorithm = new OnlineLogisticRegression(20, FEATURES, new L1()) .alpha(1).stepOffset(1000).decayExponent(0.9).lambda(3.0e-5).learningRate(20);

Topic: apache-mahout online-learning logistic-regression

Category: Data Science

User profiling with Mahout from categorized user behavior

Turcia

2015年7月1日 21:43

I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big enough). In the first table there are users and their actions. Every user has at least one action and they can have too many actions, too. About 10000 different user_actions and millions of …

Topic: apache-mahout classification clustering

Category: Data Science

Item based recommender using SVD

Ognjen

2015年3月26日 03:39

I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger): 1.00 0.88 0.96 0.99 0.88 1.00 0.99 0.96 0.96 0.99 1.00 0.86 0.99 0.96 0.86 1.00 I need to implement recommender which, for a set of items, recommends a new set of items. I was thinking about using SVD to reduce the items to n-dimensional space, let's say 50-dimensional space, so each item is represented with a vector 50 numbers, and similarity between two items is …

Topic: apache-mahout recommender-system

Category: Data Science