Keras: How to normalize dataframe with continuous and categorical data?

I have a dataframe with about 50 columns. The columns are either categorical or continuous data. The continuous data can be between 0.000001-1.00000 or they can be between 500,000-5,000,000. The categorical data is usually a name, for example a store name.

How can I normalize this data so that I can feed it into a dense layer of a Sequential model?

The Y values are either 0 or 1, so it is a binary classification problem. I am currently normalizing all of the continuous data to be 0-1 and one-hot encoding all of the categorical data, so that if I have a column with 5 names it in, I will get a matrix with 5 columns filled with 0's and 1's. Then I join all of the continuous and categorical data and feed it into a Dense layer with init='uniform' and activation='relu'.

Is this the standard way of doing things?

Topic keras tensorflow theano deep-learning neural-network

Category Data Science


Yes it does, you're doing well!

In most cases, categorical features(columns) should be one-hot encoded. However, continuous features might be a little complicated.

There are two common ways to preprocess continuous feature:

  1. scaling features to range [0, 1] (as you have done)
  2. removing the mean and scaling to unit variance(make the feature has zero mean and 1 standard variance)

In my practice, I take these two ways depending on my dataset.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.