Determining the optimal number of clusters by elbow method

I have a dataset that consists of 700 categorical columns and around 6000 rows. I created 2-50 clusters with the k-mode algorithm and plotted the cost function to determine the optimal number of clusters.

This is what the plot looks like

I am unsure how determine what is the optimal number of clusters. The cost function seems to converge at 48 clusters, which seems alot considering i have only 700 categorical columns. On the other hand at 24 clusters the curve seems to be less steep.

Could someone shed some light into this, how to analyse the plot correctly?

Topic optimization clustering machine-learning

Category Data Science


Using the elbow method, you can determine the number of clusters quantitatively in an automatic way (as opposed to doing it by eye using this method), if you introduce the quantity called the "elbow strength". Basically, it is based on the derivative of the elbow-plot with some more information-enhancing tricks. More details about the elbow strength can be found in the supplementary information of the following publication:

https://iopscience.iop.org/article/10.1088/2632-2153/abd87c

Alternatively, you can also try the silhouette method:

https://en.wikipedia.org/wiki/Silhouette_(clustering)


About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.