One-hot encoding with values other than 1

I was thinking if I have an input which has 36 possible values, and I make it as 36 inputs where exactly one of them is non 0, what is optimal value for each of the non 0 inputs?

It may be:

[1, 0, 0,....,0]
[0, 1, 0,....,0]
[0, 0, 1,....,0]

Or:

[36, 0, 0,....,0]
[0, 36, 0,....,0]
[0, 0, 36,....,0]

Or even:

[6, 0, 0,....,0]
[0, 6, 0,....,0]
[0, 0, 6,....,0]

In order this feature to have same impact on the network as any other feature with N(0,1) distribution by keeping in mind I will use L1 or L2 regularization

So for each weight on a normal input I will have 36 weights on those one-hot inputs, so regarding L1 those weights should be 36 times smaller in order to have same impact? Don't they?

But then their total impact in affecting result is small since only one of them is being multiplied by 1 and included in calculation...

So if you would be kind to explain this and convince me to use 1s instead 6s or 36s, please

Topic feature-engineering feature-scaling neural-network

Category Data Science


If you use 0s and 1s, the design matrix is composed of indicators variable. An indicator variable tells the presence (1) or absence (0) of a variable. The model weights are the magnitude. If you choose something other than boolean values, the model weight will the relative weights to the larger values in the design matrix.

Additionally if you also have numeric values and are fitting a linear model, you should standardize the numeric values. Then all variables are scale from 0 to 1 which allows the model to learn better and faster.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.