K-Means clustering for mixed numeric and categorical data
My data set contains a number of numeric attributes and one categorical.
Say, NumericAttr1, NumericAttr2, ..., NumericAttrN, CategoricalAttr
,
where CategoricalAttr
takes one of three possible values: CategoricalAttrValue1
, CategoricalAttrValue2
or CategoricalAttrValue3
.
I'm using default k-means clustering algorithm implementation for Octave. It works with numeric data only.
So my question: is it correct to split the categorical attribute CategoricalAttr
into three numeric (binary) variables, like IsCategoricalAttrValue1, IsCategoricalAttrValue2, IsCategoricalAttrValue3
?
Topic categorical-data k-means octave clustering data-mining
Category Data Science