calculating gradient descent

when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ? now as the chainrule states we calculate the gradient this way for the below neural network:

the question is if we calculate the gradient after passing all samples in the batch , then we have got different y1-hat for all the different inputs , which y1-hat is used to calculate the gradient? i'm confused

Topic backpropagation loss-function gradient-descent neural-network

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.