"Invalid value" in RMSprop implementation from scratch in Python

Edit 2: The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[fdW{l}] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:

S[fdW{l}] = beta2 * S[fdW{l}] + (1 - beta2) * (np.square(gradients[fdW{l}] + reg_term))

Edit 1: I see that S[fdW{l}] contains some negative values. How is this possible when np.square(gradients[fdW{l}] always contains positive values?

I have implemented a neural network from scratch which uses mini-batch gradient descent. The network works well. Unfortunately, I can't get my RMSprop implementation to work. I have verified that the network works well with momentum.

I get a RuntimeWarning when training the network with RMSprop: invalid value encountered in sqrt. This happens in the RMSprop update step.

My implementation of update parameters:

def update_parameters(parameters, gradients, V, S, batch_size, t, learning_rate, reg_param):
    L = len(parameters) // 2
    beta1 = 0.9
    beta2 = 0.999
    epsilon = 1e-8
    
    for l in range(1, L+1):
        reg_term = (reg_param / batch_size) * parameters[fW{l}]

        # RMSprop gradients
        S[fdW{l}] = beta2 * S[fdW{l}] + (1 - beta2) * (np.square(gradients[fdW{l}]) + reg_term)
        S[fdb{l}] = beta2 * S[fdb{l}] + (1 - beta2) * np.square(gradients[fdb{l}])
        
        # RMSprop update
        parameters[fW{l}] -= learning_rate * (gradients[fdW{l}] / (np.sqrt(S[fdW{l}])) + epsilon)
        parameters[fb{l}] -= learning_rate * (gradients[fdb{l}] / (np.sqrt(S[fdb{l}])) + epsilon)

This is how I initialize the parameters:

def init_params_V_and_S(activation_layers):
    params = {}
    V = {}
    S = {}
    L = len(activation_layers)
    
    for l in range(1, L):
        params[fW{l}] = np.random.randn(activation_layers[l], activation_layers[l-1]) * np.sqrt(2 / activation_layers[l-1])
        params[fb{l}] = np.zeros((activation_layers[l], 1))

        # RMSprop params
        S[fdW{l}] = np.zeros((activation_layers[l], activation_layers[l-1]))
        S[fdb{l}] = np.zeros((activation_layers[l], 1))

    return params, V, S

Any ideas what's causing this?

Topic optimization python machine-learning

Category Data Science


The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:

S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * np.square(gradients[f"dW{l}"] + reg_term)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.