Why does neural network need loss as scalar?

Question

Why does neural network need loss as scalar?

rakeshKM

2022年3月24日 11:04

I have a loss function that's a weighted cross entropy loss for binary classification

def BinaryCrossEntropy_weighted( y_true, y_pred, class_weight ):  
 y_true= y_true.astype(np.float)    
 y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())    
 first_term = class_weight[1] * (y_true) * K.log(y_pred + K.epsilon())
 second_term = class_weight[0] * (1.0 -y_true) * K.log(1.0 - y_pred + K.epsilon())    
 loss = -K.mean(first_term + second_term, axis=0)
 return loss

And when I run this

loss=BinaryCrossEntropy_weighted( np.array(y),np.array(predict), class_weight )

I got output

tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.16916199])

If one can observe carefully, can see that the loss is a vector(of dim(1,) ) not a scalar and I was directly passing this loss to my gradient tape and optimizer,

grads1 = tape.gradient(loss, Final_model.trainable_weights)
optimizer1.apply_gradients(zip(grads1, Final_model.trainable_weights))

Result of this was my loss not decreasing over multiple epoch, meaning my model weight was not being updated ,meaning gradient was not able to pass down/not able to calculated, Now am I correct ?

If I am correct, Now the big question is why tensorflow doesn't allow/accept the loss as a vector ? and in general does NN allow loss value as vector ?

Topic mini-batch-gradient-descent convolutional-neural-network tensorflow loss-function gradient-descent

Category Data Science

Oscar · Accepted Answer · 2022年3月24日 11:04

This is more a "programming" question rather than a data science one, however, I'll try to clarify some points:

loss has to be a scalar since the training procedure is driven by the minimisation of such a function and there is no definition of vector minimisation, nor of order in vector spaces.
I imagine you would like an automatic conversion of the quantity

<tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.16916199])>

into a scalar (such that the minimisation can have effect), this can be achieved by

loss = tf.reshape(loss, []).numpy()

Why does neural network need loss as scalar?

About