Interpreting cluster variables - raw vs scaled
I already referred these posts here and here. I also posted here but since there is no response, am posting here.
Currently, I am working on customer segmentation using their purchase data.
So, my data has below info for each customer
Based on the above linked posts I see that for clustering, we have to scale the variables if they are in different units etc.
But if I scale/normalize all of them to uniform scale, wouldn't I lose the information that actually differentiates the customers from one another? But I also understand that monetary value could construed as high weight model because they might go upto range of 100K or millions as well.
Let's assume that I normalized and my clustering returned 3 clusters. How do I answer below questions meaningfully?
q1) what is the average revenue from customers who are under cluster 1
?
q2) what is the average recency (in days) for a customer from cluster 2?
q3) what is the average age of customer with us (tenure) under cluster 3?
Response to all the above question using normalized data wouldn't make sense because they ll amight be in a unform scale mean 0, sd 1 etc
So, I was wondering whether it is meaningful to do the below
a) cluster using normalized/scaled variables
b) Once clusters are identified, use customer_id
under each cluster to get the original variable value (from input dataframe before normalization) and make inference or interpret clusters?
So, do you think it would allow me to answer my questions in a meaningful way
Is this how data scientists interpret clusters? they always have to link back to input dataframe?
Topic predictive-modeling k-means clustering data-mining machine-learning
Category Data Science