Scalable way to group users with similar titles purchased

I'm trying to figure out the best way to group customers based on checkout items in their shopping cart. I have the basket, and what's in the basket, but am at a complete loss on how to group all the similar baskets. I have a group of users I believe shouldn't be counted in my overall metrics (or at least acknowledge them). These users create a new account, place 4-5 items in their cart, and check out. Then a new account is created, and the process repeats. They seem to repeat this process for up to 12 hours and then change the items, and I'd like to group those customers vs "regular" customers.

My issue is that it's not always the same 4-5 items -- and I can't figure out how to scale up the 'search'. If I knew it was Item1-Item5, it would be an easy search, even comparing everyone's cart to that, but comparing everyone's cart to everyone else's cart seems like it won't scale at all. (Also the fact that I'd like to have the 'marginals' as well if I'm looking for 5 titles, but they got 4 of them, or all 5 and one new one, etc.)

I'm completely new to this, and even pointers to what terms I should be searching for, or what software packages I should aim to be learning a bit of would be very appreciated.

Topic similar-documents similarity clustering

Category Data Science


If you want to group your dataset into k different group based on some feature( here checkout history) you can use K Means Clustering algorithm to cluster them into different groups. You will find sklearn k means clustering module helpful. All you need to do is provide the data into it and choose appropriate hyperparameters.


This issue seems somewhat similar to a recommendation system problem, if I understood correctly, in which you have several titles you want to recommend to users based on their previous interactions.

Maybe you could read on that? You can search for algorithms such as Matrix Factorization/Collaborative Filtering.

A good resource for that: https://course.fast.ai/videos/?lesson=4

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.