Should I scale or normalise my dataset before clustering?

So i have a dataset with variables with unit of measurement as milligrams, kgs and quintals. Should i use standard scaler or minmaxscaler to scale the dataset.

Topic feature-scaling hierarchical-data-format k-means clustering

Category Data Science


As often in Machine Learning, there is no clear answer. In fact, both are valid options [1, p. 116]. However, for k-means min-max-scaling is usually used in practice [2]. So min-max-scaling would be the default choice and it's what I'd recommend. But as so often you can simply try both and see which provides better results (i.e. better internal cluster validation measures, such as the Silhouette Index).

References:

[1] https://dbs.ifi.uni-heidelberg.de/files/Team/eschubert/lectures/KDDClusterAnalysis17-screen.pdf

[2] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.