how to calculate loss function?

i hope you are doing well , i want to ask a question regarding loss function in a neural network

i know that the loss function is calculated for each data point in the training set , and then the backpropagation is done depending on if we are using batch gradient descent (backpropagation is done after all the data points are passed) , mini-batch gradient descent(backpropagation is done after batch) or stochastic gradient descent(backpropagation is done after each data point).

now let's take the MSE loss function :

how can n be the number of data points ?, because if we calculate the loss after each data point then n would be only 1 everytime.

also i saw a video in where they put n as the number of nodes in the output layer. link to video( you can find what i'm talking about in 5:45) : https://www.youtube.com/watch?v=Zr5viAZGndEt=5s

therefore iam pretty confused on how we calculate the loss function ? and what does n represent? also when we have multiple inputs, will we only be concerned with the output that the weight we are trying to update influence ? thanks in advance

Topic mse loss-function gradient-descent deep-learning neural-network

Category Data Science


As the image says, n represents the number of data points in the batch for which you are currently calculating the loss/performing backpropagation. In the case of batch gradient descent this would be the number of observations in the complete dataset, in the case of mini-batch gradient descent this would be equal to the batch size (or lower if you are using an incomplete batch of data), or 1 in the case of stochastic gradient descent.

The reason that the video talks about summing the error over the number of nodes in the output layer is because in their example they are using a network with multiple output nodes, whereas MSE is generally used for regression problems where you are only using a single output node (see for example also this question).

A network that uses multiple inputs does not have an impact on how the loss is calculated, in addition because of the chain rule used in backpropagation the algorithm only looks at the partial derivative of the loss with respect to a single weight/bias.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.