Using categorical and continuous variables in Deep Learning

I would like to apply a MLP to some business seller data. I found that the data is a mix of both categorical and continuous features. For what I read it is not advisable to feed a neural network with both types of data (reference unknown/unavailable) and I remember that I read that one can use the following model:

Categorical variables--NN model 1 
                                        -----NN model 3----Output
Continuous variables---NN model 2

So in this model we have two neural networks that are fed each one with only categorical or continuous variables, and then the outputs (from both models) are feed to the third model.

Side note: For what I can see in this proposed model I can end up with an endless loop, because maybe the output from the model 1 is categorical and the model 2 output is continuous. (?)

My question is how can I model data with a mixture of categorical and continuous features using deep learning (i.e. Neural Networks)? I do not want to use a random forest or any form of decision tree.

Thanks in advance.

Topic representation deep-learning neural-network dataset

Category Data Science


This might help if what you're asking is related to merging models:

Merging two different models in Keras


If I understand well your main issue is to transform some Categorical variables into Continuous, or some Continuous into Categorical.

You have many ways to do that:

Continuous to Categorical :

  • You can yourself set intervals, transforming your continuous value into a category. For example, for a variable Age, you define the following intervals : [0;10[, [10;20[, ... [90;100[. If someone is 36 yo, he'll be on the interval [30;40[. You'll then only have 10 categories. The complicated part is to find what are the best intervals to lose avoid losing to much information

Categorical to Continuous

  • You can apply some Sklearn Category Encoder, which, knowing your targer on training set, will transform each category to a value between 0 and 1. This value is calculated using statistics (different for each encoder), and illustrates how often your category ends up affecting your target.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.