Need suggestions on customer segmentation

I have been tasked with performing customer segmentation for a Business to business use case based on customer purchase history. Can experts provide me inputs on how do I proceed with customer segmentation based on the following dataset

Dataset details which have been provided to me

Hierarchy 3,4,5 define the categories under which the product falls

Edit: Also need inputs on how do i select features for my clustering algorithm?

Topic dimensionality-reduction clustering

Category Data Science


What @shepan6 said... but one other thing.

Since you're grouping customers, you'll want to aggregate your dataset so that each row is a customer (not just a transaction)

Your new columns might look like this, prior to your clustering exercise:

  • customerid
  • days_since_prior_transaction
  • num_transactions_ever
  • num_transactions_last_180_days
  • num_online_sales
  • num_store_sales
  • region
  • num_dinnerware_purchases
  • num_tableware_purchases
  • num_porcelain_purchases
  • num_porc_dinnerware_set_purchases
  • num_CATEGORY1_purchases
  • diff_types_categories_purchased

So the question is about how to before customer segmentation on this data.

When I do any customer segmentation, I firstly think to myself, do I know how many segments prior to the analysis or not.

If I do,

Then I would use a clustering method like K-means clustering (https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1), where k refers to number of customer segments.

If I do not,

Then I would use something like agglomerative clustering (https://www.datanovia.com/en/lessons/agglomerative-hierarchical-clustering/).

When it comes to data representation, you would represent a customer and their (purchasing) behaviours as a vector of values (I will refer to as customer vector).

If the variables are numerical (e.g. number of items purchases), then you can put the numerical values in the customer vector. If the variables are categorical (e.g. products purchased), then we concatenate a one-hot encoded vector (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/) of that variable to the customer vector.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.