Why is my loss blowing up after adding regularization
I tried to add L2 regularization to a network class I wrote however when I train it the loss blows up even though accuracy also increases. Can someone explain where I am going wrong? (I am using the formulas from here)
The update to minibatch (The (1-eta*(lmbda/n)) coefficient to w is what I added)
def update_mini_batch(self, mini_batch, eta, lmbda, n):
# n is the number of training samples being trained from
# Turn the mini_batch with one dimensional samples into two matricies and then transpose them to get samples in columns
matrix_x, matrix_y = [np.array([arr for arr in arr_list]).transpose() for arr_list in zip(*mini_batch)]
gradient_b, gradient_w = self.backprop(matrix_x, matrix_y)
self.bias = [b-(eta/len(mini_batch))*db for b, db in zip(self.bias, gradient_b)]
self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*dw for w, dw in zip(self.weights, gradient_w)]
The function that evaluates cost (I am using Quadratic Cost)
def evaluate(self, data, lmbda):
matrix_x, matrix_y = [np.array([arr for arr in arr_list]).transpose() for arr_list in zip(*data)]
output_matrix = self.feedforward(matrix_x)
cost = self.cost_func.apply(output_matrix, matrix_y)
#L2 Regularization
cost += lmbda/(2*(matrix_y.shape[1])) * sum(np.linalg.norm(w)**2 for w in self.weights)
acc = np.sum(output_matrix.argmax(axis=0)==matrix_y.argmax(axis=0))
return cost, acc
An example of my cost and accuracy during training
Epoch 0 done! Cost: 11.938649143175008. Accuracy 7397 / 10000
Epoch 1 done! Cost: 16.017232330762045. Accuracy 7381 / 10000
Epoch 2 done! Cost: 21.62351585060393. Accuracy 7431 / 10000
Epoch 3 done! Cost: 30.96422767377938. Accuracy 7498 / 10000
Epoch 4 done! Cost: 45.75409202821266. Accuracy 7669 / 10000
Epoch 5 done! Cost: 67.47752609972852. Accuracy 7691 / 10000
Epoch 6 done! Cost: 97.56030814767621. Accuracy 7574 / 10000
Epoch 7 done! Cost: 133.3273570333546. Accuracy 7503 / 10000
Epoch 8 done! Cost: 174.7085211732363. Accuracy 7341 / 10000
After running this for longer the cost still increases continually and no change in eta or lambda changes this fact. Once thing I noticed was that the individual MSE error was behaving normally and it was just the magnitude of the weights that was increasing.
Topic regularization loss-function neural-network
Category Data Science