Aggregating transactional data for customer segmentation

I have item-level transactional data where each row in the data represents a different item bought by a customer in a transaction (so if two different items were bought in the same transaction by the same customer there would be two rows where the customer_id and the transaction_id columns have the same value)

Eg:

Customer_id transaction_id item_bought quantity
a 00001 cheese 2
b 00002 ham 1
b 00002 pepsi 2

In this case customer b bought two items in the same transaction so there are two rows with the same value in both customer_id and transaction_id columns.

I want to be able to cluster customers based on the sorts of items that they buy and other factors such as the time of day that they purchase items.

To do this do I have to aggregate the data so that each customer is represented by a single row or is it possible to set up my model in such a way that I don't have to aggregate the data? My concern is that I would like to be able to look at behvaiour on a transaction_id level too (e.g. this customer always buys a coffee in the morning and a pizza at night) and if I aggregate the data to customer_id level then I'll lose that detail.

Topic machine-learning-model aggregation data-cleaning clustering machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.