Negative loss values for adaptive loss in tensorflow

I have used adaptive loss implementation on a neural network, however after training a model long enough, I am getting negative loss values. Any help/suggestion would be highly appreciated! Please let me know if you need additional info

Model definition -

hyperparameter_space = {gru_up: 64,
                        up_dropout: 0.2,
                        learning_rate: 0.004}

def many_to_one_model(params):
    input_1 = tf.keras.Input(shape = (1, 53), name = 'input_1')
    input_2 = tf.keras.Input(shape = (1, 19), name = 'input_2')
    input_3 = tf.keras.Input(shape = (1, 130), name = 'input_3')

    input_3_flatten = Flatten()(input_3)
    input_3_flatten = RepeatVector(1)(input_3_flatten)

    concat_outputs = Concatenate()([input_1, input_2, input_3_flatten])

    output_1 = GRU(units = int(params['gru_up']),
                   kernel_initializer = tf.keras.initializers.he_uniform(),
                   activation = 'relu')(concat_outputs)
    output_1 = Dropout(rate = float(params['up_dropout']))(output_1)
    output_1 = Dense(units = 1, 
                     activation = 'linear',
                     name = 'output_1')(output_1)

    model = tf.keras.models.Model(inputs = [input_1, input_2, input_3],
                                  outputs = [output_1],
                                  name = 'many_to_one_model')

    return model

many_to_one_model(hyperparameter_space)

Model summary -

'''
Model: many_to_one_model
______________________________________________________________________________________________
Layer (type)                      Output Shape       Param #       Connected to
______________________________________________________________________________________________
input_3 (InputLayer)              [(None, 1, 130)]   0    

flatten_5 (Flatten)               (None, 130)        0             input_3[0][0]

input_1 (InputLayer)              [(None, 1, 53)]    0

input_2 (InputLayer)              [(None, 1, 19)]    0

repeat_vector_5 (RepeatVector)    (None, 1, 130)     0             flatten_5[0][0]

concatenate_5 (Concatenate)       (None, 1, 202)     0             input_1[0][0]
                                                                   input_2[0][0]
                                                                   repeat_vector_5[0][0]
gru_5 (GRU)                       (None, 64)         51456         concatenate_5[0][0]

dropout_5 (Dropout)               (None, 64)         0             gru_5[0][0]

output_1 (Dense)                  (None, 1)          65            dropout_5[0][0]
_____________________________________________________________________________________________
Total params: 51,521
Trainable params: 51,521
Non-trainable params: 0

'''

Adaptive loss implementation -

import robust_loss.general
import robust_loss.adaptive

model = many_to_one_model(hyperparameter_space)

adaptive_lossfun = robust_loss.adaptive.AdaptiveLossFunction(num_channels = 1,
                                                             float_dtype = np.float32)


variables = (list(model.trainable_variables) + list(adaptive_lossfun.trainable_variables))

optimizer_call = getattr(tf.keras.optimizers, Adam)
optimizer = optimizer_call(learning_rate = hyperparameter_space[learning_rate], 
                           amsgrad = True)

mlflow_callback = LambdaCallback()

for epoch in range(750):
    def lossfun():
    # Stealthily unsqueeze to an (n,1) matrix, and then compute the loss.
    # A matrix with this shape corresponds to a loss where there's one shape
    # and scale parameter per dimension (and there's only one dimension for
    # this data).
        aa = y_train_up - model([train_cat_ip, train_num_ip, ex_train_num_ip])
        mean_calc = tf.reduce_mean(adaptive_lossfun(aa))
        return mean_calc

    optimizer.minimize(lossfun, variables)

    loss = lossfun()
    alpha = adaptive_lossfun.alpha()[0, 0]
    scale = adaptive_lossfun.scale()[0, 0]
    print('{:4}: loss={:+0.5f} alpha={:0.5f} scale={:0.5f}'.format(epoch, loss, alpha, scale))
    mlflow_callback.on_batch_end(epoch, mlflow.log_metrics({loss:loss.numpy(),
                                                            alpha:alpha.numpy(),
                                                            scale:scale.numpy()}, 
                                                           epoch))

Loss, alpha and scale vs epochs graph -

loss against epochs

Here is the github repo for robust adaptive loss: https://github.com/google-research/google-research/tree/5b4f2d4637b6adbddc5e3261647414e9bdc8010c/robust_loss

Topic keras tensorflow loss-function python

Category Data Science


Stumbled over this today as well. For everyone who finds this through google search this is what the docs in the code say:

"These "losses" are actually negative log-likelihoods (as produced by distribution.nllfun()) and so they are not actually bounded from below by zero --- it is okay if they go negative! You'll probably want to minimize their sum or mean."

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.