Keras: How to normalize dataframe with continuous and categorical data?

Question

Keras: How to normalize dataframe with continuous and categorical data?

user1367204

2019年6月9日 15:45

I have a dataframe with about 50 columns. The columns are either categorical or continuous data. The continuous data can be between 0.000001-1.00000 or they can be between 500,000-5,000,000. The categorical data is usually a name, for example a store name.

How can I normalize this data so that I can feed it into a dense layer of a Sequential model?

The Y values are either 0 or 1, so it is a binary classification problem. I am currently normalizing all of the continuous data to be 0-1 and one-hot encoding all of the categorical data, so that if I have a column with 5 names it in, I will get a matrix with 5 columns filled with 0's and 1's. Then I join all of the continuous and categorical data and feed it into a Dense layer with init='uniform' and activation='relu'.

Is this the standard way of doing things?

Topic keras tensorflow theano deep-learning neural-network

Category Data Science

Icyblade · Accepted Answer · 2017年2月4日 09:50

Yes it does, you're doing well!

In most cases, categorical features(columns) should be one-hot encoded. However, continuous features might be a little complicated.

There are two common ways to preprocess continuous feature:

scaling features to range [0, 1] (as you have done)
removing the mean and scaling to unit variance(make the feature has zero mean and 1 standard variance)

In my practice, I take these two ways depending on my dataset.

Keras: How to normalize dataframe with continuous and categorical data?

About