Multi-Label Regression of Categorical Probability Distribution that adds up to one

What would an ideal Tensorflow/Keras architecture look like, if the target is a multi-regression with values that add up to one?

Toy Example: Tv Channels You work for a big TV-Station and your boss wants you to anticipate the market share of the five biggest channels based on features like

  • The weather (categorical)
  • The temperature (numerical)
  • Holiday (yes/no, binary)
  • The prime-time program of Channel 1 (categorical)
  • The prime-time program of Channel 2 (categorical)
  • ...
  • The prime-time program of Channel 20 (categorical)

The desired output would be something like

[Channel 1: 0.05, Channel 2: 0.25, Channel 3: 0.12, Channel 4: 0.08, Channel 5: 0.4, Others Together: 0.1]

Using Categorical Cross-Entropy with a Softmax Activation Function as the output layer ensures that the output sums up to one and behaves like a probability distribution. The problem: Categorical Cross-Entropy expects the Labels to be one-hot encoded. Instead of taking the actual distribution, it expects a single true class during training and ignores the values of all other classes. The probability distribution produced by the trained model then results from the fact that there is uncertainty involved in the training data

(example: like very similar feature vectors for the last 10 X-Mas Holidays (same cold temperature, similar Tv program, but people were watching different channels)

and for that reason you get a model output like [Channel 1: 0.9, Channel 2: 0.1,...] because in one out of ten years most people were watching Channel 2 while watching Channel 1 the other 9 years. But instead of learning a distribution of what the true one-hot encoded answer is (the one true channel), the actual labels that shall be trained are the market shares of the five channels + a sixt Others together category.

When simply treating this as a multi-regression problem with 6 output nodes using MAE as cost function, how can I ensure that the outputs add up to one? Or is there a better way of dealing with such a categorical probability distribution?

Topic multi-output keras tensorflow multilabel-classification

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.