Association rules for classification

I'm working on a classification project. I have many rows, containing many binary attributes, some of which are often appearing together, exactly like what we can encounter in the Market Basket problem (in which you can, for example identify, that if you buy 'Milk' to a supermarket, you also have a more than random chance to buy 'Eggs').

My idea is then to take my target as an attribute, extract best Item-set containing my target (so having Target=1, exactly like my previous example if I wanted to make a model to predict whether the client will buy eggs or not).

Then, looking at the fit of each item-set, I'll have the info of which groups of products have best chances to make my target be 1.

Even though the method looks good, I have a problem. I'm wondering if there's a way to only extract item-sets with target on them.

All articles I found about making classification based on association rules explain that one of the big downside of the method is time taken, because it has to find all item-sets, and after removing all the ones that are not including target.

Is there a way, using item-set properties, to specify the algorithm (Apriori, FP-Growth, etc.) to only calculate Item-sets including Target, and not waste time calculating all item-sets?

Topic association-rules classification python

Category Data Science


If I understand correctly you don't have your target and want to create it using associacion rules. Fp-growth algorithm already reduces itemset checks e.g. based on the fact that if a certain itemset doesn't match selected threshold any other superset of this itemset won't match it either. For this reason it is several times faster than apriori. If you want to create your target using associacion rules and then train a model to predict this target you need let the algorithm go over each example and let it discard each itemset not matching your threshold. What you can do to speed up the computation is:

  1. Use FP-Growth on a distributed system.
  2. Use even more restrictive condititons to discard itesets faster.
  3. Learn association rules only on a subset of items for example top 1000 bestsellers.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.