Regression problem with Deep Learning

I'm working on the Housing Price dataset, where the target is to predict the housing price.

The price of the house will always be positive and according to me, it's possible that the model can predict a negative outcome for some of the samples.

  1. If it's correct, is there any way to control the training such that the model always predicts at least the positive value.

  2. As in the case of the classification case we use the Sigmoid/Softmax activation function to normalized the outcome in probability. Can we have some activation function for positive value?

  3. Can I use Poisson loss?

Topic loss regression

Category Data Science


With small data sets or data that has targets close to or even at zero, this is a common phenomenon with regression. And "close to" is of course relative. When you have prices of up to 20 million, 500k is close. All of this is independent of the method selected, wether it's a linear model or a DNN.

These are some ways to deal with this (from best to worst):

  • If you can get more data, get more data.
  • Select a strictly positive but unbounded activation function for the final layer. My favourite is softplus. It's a smooth approximation of relu.
  • Add below zero occurrences with a significant weight to the loss function.
  • You could put more weight on smaller prices to force the model to be more accurate there but you will lose accuracy on the higher prices.
    • Add weights to the training samples. Higher weights for lower prices.
    • Pre process the targets with a log function.
  • Set these to zero (or whatever lowest value makes sense) after the model.

  1. The model predicts negative value because of the following two reasons
  • Check whether any data is feed to model without pre-processing(i.e. Data consists of any inappropriate values.)
  • Feed enough data to train the model(i.e. 80% of data)
  1. The target variable of the problem is "housing price", it clearly states that this is a regression problem. For regression problems, the activation function which works well is "RELU". https://www.mygreatlearning.com/blog/relu-activation-function/

  2. Poisson loss cannot be used for this problem. For this problem, Log Loss will work well.

When to use Poisson loss? When your Target variable follows Poisson distribution only you can use this loss function. Eg : When the target implies "Number of customer entered in the store within next hour", you can use poison loss function.


The ability of the model to predict negative value for the housing price depends on the data. On the large amount of data, where there are no negative pricing, the model does not predict a negative number. However, in rare case, where the model is not trained well or has not seen such samples, then it is still possible.

  1. The models prediction on the positive value can be still controlled post predictions. Just like using a treshold. y = y if y>0 else 0; Where the housing cost (y) is as it is if it's positive, 0 otherwise.

  2. ReLu function, works in the way you desire. Negative values gets converted to 0 by the activation.

I am not very sure about the Poisson loss, you may try it.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.