Clustering based on features of varied importance
Suppose I have a dataset that includes the following features {HairColor, EyeColor, EducationLevel, Income}. I would like to perform clustering to separate the dataset into smaller datasets that you would expect to behave similarly. The difficulty that arises is that it is clear that EducationLevel and Income are much more important than HairColor and EyeColor but I do not know how to measure that importance for the sake of clustering.
In the example below, I would want it to be clear that Row 1 is more similar to Row 3, than to Row 2.
ID | EyeColor | HairColor | EducationLevel | Income |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 |
3 | 2 | 2 | 1 | 1 |
Topic clustering
Category Data Science