How to choose the optimal k in k-protoypes?

Sergi F.

2021年10月6日 10:47

To analyze a dataset from banking I have both numerical and categorical values. I transform them to analyze with k-prototypes.

The original dataset:

The modified dataset:

Should I scale the dataset before doing the k-prototypes?

How could I determine the optimal "k" to choose (coding)?

I thought to execute:

library(clustMixType)

lbd - lambdaest(BPor)

kpres - kproto(BPor, 5, lambda = lbd) #Change '5' for every possible value of k.

print(kpres)

And then, calculate the sum of within cluster error (choosing the little one).