How to choose the optimal k in k-protoypes?

To analyze a dataset from banking I have both numerical and categorical values. I transform them to analyze with k-prototypes.

The original dataset:

The modified dataset:

  • E.g.: Job (for 1 to 12 'cos there are 12 levels)

Should I scale the dataset before doing the k-prototypes?

How could I determine the optimal "k" to choose (coding)?

I thought to execute:

library(clustMixType)

lbd - lambdaest(BPor)

kpres - kproto(BPor, 5, lambda = lbd) #Change '5' for every possible value of k.

print(kpres)

And then, calculate the sum of within cluster error (choosing the little one).

Topic numerical r categorical-data k-means clustering

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.