Force neural network to only product positive values

I have a custom neural network that has been written from scratch in python and also a dataset where negative target/response values are impossible, however my model sometimes produces negatives forecasts/fits which I'd like to completely avoid.

Rather than transform the input data or clip the final forecasts, I'd like to force my neural network to only product positive values (or values above a given threshold) during forward and back propagation.

I believe I understand what needs to be done during forward propagation - I need to set negative values to 0, however what I do not understand is how this effects the back propagation step and what I must do to ensure I propagate the derivative of the clipping of negative values, if such a thing needs to happen or even makes sense.

Any advice would be greatly appreciated, and I have added some example forward and back propagation code below with the forward prop update I believe I need to make.

TLDR: How do I ensure y_hat is always positive and correctly back propagate if any updates are made to forward propagation

FYI: I am using the ReLU activation function

Forward Prop code

z1 = x.dot(W1) + b1
a1 = activation(z1=z1)
y_hat = (a1.dot(W2) + b2).A1
# y_hat[y_hat0] = 0 # is this fine? If so, does anything need to happen in backprop?

Back Prop code

dCost = np.matrix((y_hat - y) / N).T
dW2 = (a1.T).dot(dCost)
db2 = np.sum(dCost)
da1 = dCost.dot(W2.T)
dz1 = np.multiply(da1, d_activation(z1=z1))
dW1 = np.dot(x.T, dz1)
db1 = np.sum(dz1)

Topic backpropagation neural-network

Category Data Science


The ReLU (or "clipping negative values") is defined as $f(x) = max(0, x)$ so it's derivative is $$ f'(x) = \begin{cases} 0 &\quad\text{if}\qquad x < 0 \\ \text{undefined} & \quad \text{if} \qquad x = 0 \\ 1 &\quad\text{if }\qquad x > 0\\ \end{cases} $$ To deal with the undefined case you usually assign $0$ or a very small fraction $e$. Using gradient chaining rule to calculate all the gradients you have to multiply dCost by $f'(x)$ which leaves you with the following code.

dCost = np.matrix((y_hat - y) / N).T
dy_hat = dCost.copy()
dy_hat[y_hat <= 0] = 0
dW2 = (a1.T).dot(dy_hat)
...

You can find more throrough explanation in this question.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.