Negative loss values for adaptive loss in tensorflow
I have used adaptive loss implementation on a neural network, however after training a model long enough, I am getting negative loss values. Any help/suggestion would be highly appreciated! Please let me know if you need additional info
Model definition -
hyperparameter_space = {gru_up: 64,
up_dropout: 0.2,
learning_rate: 0.004}
def many_to_one_model(params):
input_1 = tf.keras.Input(shape = (1, 53), name = 'input_1')
input_2 = tf.keras.Input(shape = (1, 19), name = 'input_2')
input_3 = tf.keras.Input(shape = (1, 130), name = 'input_3')
input_3_flatten = Flatten()(input_3)
input_3_flatten = RepeatVector(1)(input_3_flatten)
concat_outputs = Concatenate()([input_1, input_2, input_3_flatten])
output_1 = GRU(units = int(params['gru_up']),
kernel_initializer = tf.keras.initializers.he_uniform(),
activation = 'relu')(concat_outputs)
output_1 = Dropout(rate = float(params['up_dropout']))(output_1)
output_1 = Dense(units = 1,
activation = 'linear',
name = 'output_1')(output_1)
model = tf.keras.models.Model(inputs = [input_1, input_2, input_3],
outputs = [output_1],
name = 'many_to_one_model')
return model
many_to_one_model(hyperparameter_space)
Model summary -
'''
Model: many_to_one_model
______________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
______________________________________________________________________________________________
input_3 (InputLayer) [(None, 1, 130)] 0
flatten_5 (Flatten) (None, 130) 0 input_3[0][0]
input_1 (InputLayer) [(None, 1, 53)] 0
input_2 (InputLayer) [(None, 1, 19)] 0
repeat_vector_5 (RepeatVector) (None, 1, 130) 0 flatten_5[0][0]
concatenate_5 (Concatenate) (None, 1, 202) 0 input_1[0][0]
input_2[0][0]
repeat_vector_5[0][0]
gru_5 (GRU) (None, 64) 51456 concatenate_5[0][0]
dropout_5 (Dropout) (None, 64) 0 gru_5[0][0]
output_1 (Dense) (None, 1) 65 dropout_5[0][0]
_____________________________________________________________________________________________
Total params: 51,521
Trainable params: 51,521
Non-trainable params: 0
'''
Adaptive loss implementation -
import robust_loss.general
import robust_loss.adaptive
model = many_to_one_model(hyperparameter_space)
adaptive_lossfun = robust_loss.adaptive.AdaptiveLossFunction(num_channels = 1,
float_dtype = np.float32)
variables = (list(model.trainable_variables) + list(adaptive_lossfun.trainable_variables))
optimizer_call = getattr(tf.keras.optimizers, Adam)
optimizer = optimizer_call(learning_rate = hyperparameter_space[learning_rate],
amsgrad = True)
mlflow_callback = LambdaCallback()
for epoch in range(750):
def lossfun():
# Stealthily unsqueeze to an (n,1) matrix, and then compute the loss.
# A matrix with this shape corresponds to a loss where there's one shape
# and scale parameter per dimension (and there's only one dimension for
# this data).
aa = y_train_up - model([train_cat_ip, train_num_ip, ex_train_num_ip])
mean_calc = tf.reduce_mean(adaptive_lossfun(aa))
return mean_calc
optimizer.minimize(lossfun, variables)
loss = lossfun()
alpha = adaptive_lossfun.alpha()[0, 0]
scale = adaptive_lossfun.scale()[0, 0]
print('{:4}: loss={:+0.5f} alpha={:0.5f} scale={:0.5f}'.format(epoch, loss, alpha, scale))
mlflow_callback.on_batch_end(epoch, mlflow.log_metrics({loss:loss.numpy(),
alpha:alpha.numpy(),
scale:scale.numpy()},
epoch))
Loss, alpha and scale vs epochs graph -
Here is the github repo for robust adaptive loss: https://github.com/google-research/google-research/tree/5b4f2d4637b6adbddc5e3261647414e9bdc8010c/robust_loss
Topic keras tensorflow loss-function python
Category Data Science