Convert nominal to numeric variables?
I am trying to develeop an algorithm with sklearn and Tensorflow to predict which car can be offer to each customer.
To do that I have a database with the answers of one survey to 1000 customers.
An example of questions/[Answers] are:
- Color/[Green,Red,Blue]
- NumberOfPax/[2,4,5,6,7]
- HorsePower/[Integer]
- InsuranceIncluded[yes/no/Don't know]
As you can see all questions are answer previously tipified, and in case the answer can be open I validate the value to be an integer or a radio button.
The purpose of that beahivour is that despite the categorical variables I can easily use sklearn to clustering the data.
Will be a good approach to translate this categories to numerical value as an intern procedure an then cluster with this references?
For example: yes -- 0; No -- 1; Don't know -- 2
Then sklearn will cluster with all variables as numerical values.
I have thought this possibility beacuse I believe that sklearn can not cluster nominal data.
What do you think about this approach?
Topic numerical scikit-learn classification categorical-data machine-learning
Category Data Science