Aggregating transactional data for customer segmentation
I have item-level transactional data where each row in the data represents a different item bought by a customer in a transaction (so if two different items were bought in the same transaction by the same customer there would be two rows where the customer_id
and the transaction_id
columns have the same value)
Eg:
Customer_id | transaction_id | item_bought | quantity |
---|---|---|---|
a | 00001 | cheese | 2 |
b | 00002 | ham | 1 |
b | 00002 | pepsi | 2 |
In this case customer b
bought two items in the same transaction so there are two rows with the same value in both customer_id
and transaction_id
columns.
I want to be able to cluster customers based on the sorts of items that they buy and other factors such as the time of day that they purchase items.
To do this do I have to aggregate the data so that each customer is represented by a single row or is it possible to set up my model in such a way that I don't have to aggregate the data? My concern is that I would like to be able to look at behvaiour on a transaction_id level too (e.g. this customer always buys a coffee in the morning and a pizza at night) and if I aggregate the data to customer_id level then I'll lose that detail.