Will one hot encoding / unbalanced columns cause bias to Clustering Analysis?
I'm wondering if having too many columns about one certain feature is gonna cause bias to the clustering analysis.
For example, if my dataset has columns = ['incoming calls', 'outgoing calls', 'missing calls', 'age'], and if I run clustering algorithms such as K-means or Mixture Model, will the clustering results be biased since it splits datasets mainly based on calls?
Another example is if I have two categorical columns: color ('red','blue','green'), and shape ('circle','square'), after one hot encoding, color will expand into three columns and shape will expand into two. If I cluster on the one-hot encoded dataset, will color have a larger weight than shape in terms of splitting the data?
Topic one-hot-encoding k-means clustering data-mining machine-learning
Category Data Science