Should I scale or normalise my dataset before clustering?

Question

Should I scale or normalise my dataset before clustering?

Karan Khurana

2021年5月17日 10:53

So i have a dataset with variables with unit of measurement as milligrams, kgs and quintals. Should i use standard scaler or minmaxscaler to scale the dataset.

Topic feature-scaling hierarchical-data-format k-means clustering

Category Data Science

Sammy · Accepted Answer · 2021年5月17日 10:53

As often in Machine Learning, there is no clear answer. In fact, both are valid options [1, p. 116]. However, for k-means min-max-scaling is usually used in practice [2]. So min-max-scaling would be the default choice and it's what I'd recommend. But as so often you can simply try both and see which provides better results (i.e. better internal cluster validation measures, such as the Silhouette Index).

References:

[1] https://dbs.ifi.uni-heidelberg.de/files/Team/eschubert/lectures/KDDClusterAnalysis17-screen.pdf

[2] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173

Should I scale or normalise my dataset before clustering?

About