Association Rule Mining across two market baskets

I am quite familiar with Association Rule mining but I need to use it to associate ACROSS two market baskets instead of finding support WITHIN a market basket. Imagine customers come to a Store A and buy a certain number of products. The same customers go to Store B and buy another set of products. I want to associate between the two Stores and not within the Store. So I want to make "A --> B" statements like "Customers that …
Category: Data Science

What is the Zhang's rule?

I'd been doing some reading on Association Rule Mining and bumped into a Kaggle dataset where a competitor had applied Zhang's rule. I would like to know what it is. I tried to look for it online, and most of the hits revolve around some Chinese emperor by that name whol ruled China. And the other things arn't really relevant. If there is anything that you can share about it, like its significance that'd be great. There's also no tag …
Category: Data Science

Issue with using Sparse Data Frame in Mlxtend Apriori function

I am running python 2.7 in anaconda and have installed mlxtend. Based on the latest version of mlxtend, the aprioir class supports sparse dataframe as its input. I have over 500k products that I want to run a market basket analysis on. I have created a onehot encoded sparse dataframe using a small dataset to test but I am running into df.to_coo() issue on the sparse data frame inside the mlextend apriori function. Please find the code, the input data …
Category: Data Science

Find changes in variables into two states

I have a dataframe like this: dframe <- structure(list(c(60, 91, 377, 419, 893, 905), c(-0.6647, -0.0275000000000001, -0.6311, 0.1328, -0.4559, -1.0208), c(-1.6964, -1.3851, -1.1428, -1.4191, -1.2979, -1.441), c(4.1104, 2.998, 3.4623, 1.9545, 3.5166, 3.9912), c(-1.6663, -1.0789, -1.6608, -1.0137, -1.4022, -1.6189 ), c(0.902, 0.5417, 0.2651, -0.4998, 0.72, 1.0902), c(0.061, -0.1321, -0.6613, -0.9655, -0.3879, -0.3222), c(0.6573, -1.8156, -1.1072, -1.6147, -1.7412, -0.8048), c(-1.6561, 3.3495, 3.1694, 4.7327, 3.7275, 3.0135), c(0.2499, -1.5437, -1.3843, -1.8279, -1.487, -1.133), c(1.1265, 0.2224, 0.5074, 0.9983, 0.4906, 0.3672 ), structure(c(3, 1, 3, 1, …
Category: Data Science

How to generate more market basket association rules for products with smaller basket sizes?

I'm working with data where many customers only buy 1-3 products at a time, meaning that there aren't enough products being purchased together for the market basket algorithm to determine associations. Any idea how I can get around this? I'm thinking of grouping transactions together by week or month to get larger basket sizes, but I'm skeptical of that approach since customers can place many orders in a week that have nothing to do with each other.
Category: Data Science

FP-Growth and Association Rules

I have recently worked through this video and created a dummy dataset, Fp-tree and conditional database. I have two questions regarding the outputs: Are these example outputs correct given the inputs? Are association rules generated by utilising the frequent patterns found? If yes to 2, is there almost a 'skip-level' association in play? For example, for the Carrots patterns. I can see the flow is Beer -> Diaper -> Carrots. But could a frequent pattern be also Diaper -> Carrots …
Category: Data Science

How to figure out what elements are missing from a set, based on other sets?

I would like to solve a problem where I have a set of sets of possible values, but some elements of some sets are corrupted/deleted, so I had to figure out what is the most probable candidate replacement for the corrupted value. So there are a set of possible elements: E1, E2, E3 ... E6. I have a set of sets of elements without corruption. The presence/absence of the respective potential elements is represented with binary numbers: E1 E2 E3 …
Category: Data Science

How to present Market Basket Analysis Results?

I am working on a Retail Company's in-store transactions for 3 months. I have performed the Market Basket Analysis on the same and I'm getting hundreds if not thousands of association rules. I am using the apriori algorithm from mlxtend.frequent_patterns import apriori in Python and I have used different support values in apriori(basket_sets, min_support=0.01, use_colnames=True), all the way from 0.01 to 0.4. If I use a support value too high, (for some stores there are no rules found), there are …
Category: Data Science

100 items 100 baskets divisor association analysis problem

I have the following exercise question: Suppose there are 100 items, numbered 1 to 100, and also 100 baskets, also numbered 1 to 100. Item i is in basket b if and only if i divides b with no remainder. Thus, item 1 is in all the baskets, item 2 is in all fifty of the even-numbered baskets, etc. for example Basket 12 consists of items {1, 2, 3, 4, 6, 12} Given this, I'm trying to solve the following …
Category: Data Science

How to find the position of a company given other companies in a Pandas DataFrame with Python

So I have a Pandas DataFrame. I am doing some data analysis with Python with some sets of companies based on the products they offer on my website. For example, I have a column called ProductID and another called Company So to find the most trending products on my website, I can do so by: df.ProductID.value_counts(normalize=True).nlargest(10).plot(kind="bar") And each product is offered by a company, and a company could have several products, that is a company can have several ProductID. ie …
Category: Data Science

How should I tackle this real-life hypermarket problem?

I registered myself in the payback program of the hypermarket I am going to. For every 2$ I get 1 point. I buy the same products every week (Feta 2.19\$, Milk 0.99$, ...). I visit only in weekdays. I would like to maximize the amount of points I gather, while I also maximize the times I visit that hypermarket (so buying all the stuff at once is an awful solution). How should I go about modeling this real-life situation in …
Category: Data Science

Identifying Customers who are more likely to purchase a given product category- Which model to use?

I'm pretty new to data science. I am working on a model to identify customers who are more likely to purchase a given product category. I did try market basket using arules package from R. I also used MatchIt, Matching algorithm and did try Propensity score matching to find similar customers who tend purchase in the given product category. But I'm not quite sure the model is good enough for this kinda problem. I'm just wondering what kind of model …
Category: Data Science

Market-basket: calculating support/confidence/lift/rules

How can I calculate support/confidence/lift on a dataset in order to find frequent itemsets and determine association rules, in python? What would be the most effective method for predicting and offering recommendations on a test set of incomplete "shopping carts"? I am limited to the Anaconda distribution so I cant use packages such as orange3, etc.
Category: Data Science

Very low accuracy of new data compared to validation data

I'm trying to train the neural network to predict the movement of a particular security on the market. I teach on historical data collected for the year. At the entrance of the neural network candlesticks are served: close price and value Before submitting, these data are normalized separately for each dataset. This happens with the z Score algorithm. Then the question immediately arises... output can not be obtained in the limit [0;1] or [-1;1] and can reach up to 10 …
Category: Data Science

Is web analytics similar to data science?

I just finished PhD and initially wished to work on data science and deep learning. However, after some rounds of interviews, I have been offered a job of web analytics and business intelligence at a medium size company. Is there any similarity with data science, and is there a future in it? Because of some precarious situation, I have to accept this job, but should I keep looking for another job meanwhile, or will the experience be helpful to rise …
Category: Data Science

Market Basket Analysis - Data Modelling

Imagine that I've the following dataset: Customer_ID Product_Desc 1 Jeans 1 T-Shirt 1 Food 2 Jeans 2 Food 2 Nightdress 2 T-Shirt 2 Hat 3 Jeans 3 Food 4 Food 4 Water 5 Water 5 Food 5 Beer I need to make the consumer behaviour and predicte what products are associated. For do that I think that will a good strategy make the relationships first and then count the occurrences (don't know if anyone have a better idea). The first …
Category: Data Science

Correlation between products based on purchases placed around the same date

Association rule learning has a fair bit of material based around the correlation of products purchased on the same order/at the same time. However I'd like to discover if there is a method for identifying such a relationship between products that are ordered near each other, but not together. Say for example a customer purchases a pencil in week one, but later purchases an eraser in week two. Then a year later they do the same thing. But then also …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.