Scalable way to group users with similar titles purchased
I'm trying to figure out the best way to group customers based on checkout items in their shopping cart. I have the basket, and what's in the basket, but am at a complete loss on how to group all the similar baskets. I have a group of users I believe shouldn't be counted in my overall metrics (or at least acknowledge them). These users create a new account, place 4-5 items in their cart, and check out. Then a new account is created, and the process repeats. They seem to repeat this process for up to 12 hours and then change the items, and I'd like to group those customers vs "regular" customers.
My issue is that it's not always the same 4-5 items -- and I can't figure out how to scale up the 'search'. If I knew it was Item1-Item5, it would be an easy search, even comparing everyone's cart to that, but comparing everyone's cart to everyone else's cart seems like it won't scale at all. (Also the fact that I'd like to have the 'marginals' as well if I'm looking for 5 titles, but they got 4 of them, or all 5 and one new one, etc.)
I'm completely new to this, and even pointers to what terms I should be searching for, or what software packages I should aim to be learning a bit of would be very appreciated.
Topic similar-documents similarity clustering
Category Data Science