K-Means clustering for mixed numeric and categorical data
My data set contains a number of numeric attributes and one categorical.
Say, NumericAttr1, NumericAttr2, ..., NumericAttrN, CategoricalAttr,
where CategoricalAttr takes one of three possible values: CategoricalAttrValue1, CategoricalAttrValue2 or CategoricalAttrValue3.
I'm using default k-means clustering algorithm implementation for Octave. It works with numeric data only.
So my question: is it correct to split the categorical attribute CategoricalAttr into three numeric (binary) variables, like IsCategoricalAttrValue1, IsCategoricalAttrValue2, IsCategoricalAttrValue3 ?
Topic categorical-data k-means octave clustering data-mining
Category Data Science
