Why is my loss blowing up after adding regularization

I tried to add L2 regularization to a network class I wrote however when I train it the loss blows up even though accuracy also increases. Can someone explain where I am going wrong? (I am using the formulas from here)

The update to minibatch (The (1-eta*(lmbda/n)) coefficient to w is what I added)

def update_mini_batch(self, mini_batch, eta, lmbda, n):
    # n is the number of training samples being trained from
    # Turn the mini_batch with one dimensional samples into two matricies and then transpose them to get samples in columns
    matrix_x, matrix_y = [np.array([arr for arr in arr_list]).transpose() for arr_list in zip(*mini_batch)]
    
    gradient_b, gradient_w = self.backprop(matrix_x, matrix_y)
    
    self.bias = [b-(eta/len(mini_batch))*db for b, db in zip(self.bias, gradient_b)]
    self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*dw for w, dw in zip(self.weights, gradient_w)]

The function that evaluates cost (I am using Quadratic Cost)

def evaluate(self, data, lmbda):
    matrix_x, matrix_y = [np.array([arr for arr in arr_list]).transpose() for arr_list in zip(*data)]
    output_matrix = self.feedforward(matrix_x)
    
    cost = self.cost_func.apply(output_matrix, matrix_y) 
    #L2 Regularization
    cost += lmbda/(2*(matrix_y.shape[1])) * sum(np.linalg.norm(w)**2 for w in self.weights)
    acc = np.sum(output_matrix.argmax(axis=0)==matrix_y.argmax(axis=0))
    
    return cost, acc

An example of my cost and accuracy during training

Epoch 0 done! Cost: 11.938649143175008. Accuracy 7397 / 10000
Epoch 1 done! Cost: 16.017232330762045. Accuracy 7381 / 10000
Epoch 2 done! Cost: 21.62351585060393. Accuracy 7431 / 10000
Epoch 3 done! Cost: 30.96422767377938. Accuracy 7498 / 10000
Epoch 4 done! Cost: 45.75409202821266. Accuracy 7669 / 10000
Epoch 5 done! Cost: 67.47752609972852. Accuracy 7691 / 10000
Epoch 6 done! Cost: 97.56030814767621. Accuracy 7574 / 10000
Epoch 7 done! Cost: 133.3273570333546. Accuracy 7503 / 10000
Epoch 8 done! Cost: 174.7085211732363. Accuracy 7341 / 10000

After running this for longer the cost still increases continually and no change in eta or lambda changes this fact. Once thing I noticed was that the individual MSE error was behaving normally and it was just the magnitude of the weights that was increasing.

Topic regularization loss-function neural-network

Category Data Science


When you use regularization, your loss will be larger because you add the regularization term. So, it is normal if your best loss without regularization is lower than your best loss with regularization.

However, adding regularization should not affect convergence in the long-term. Meaning that even though your overall loss might be larger, it should decrease after some epochs.

I see that it increases for the first few epochs. I would suggest running your model for longer and see how the loss behaves. If an epoch takes long, use a small chunk of your dataset to investigate the behaviour of the loss. It may be possible that for the first few epochs, the loss doesn't start converging quite yet.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.