calculating gradient descent
when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ?
now as the chainrule states we calculate the gradient this way for the below neural network:
the question is if we calculate the gradient after passing all samples in the batch , then we have got different y1-hat for all the different inputs , which y1-hat is used to calculate the gradient? i'm confused
Topic backpropagation loss-function gradient-descent neural-network
Category Data Science