calculating gradient descent

imene

2022年5月26日 10:52

when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ? now as the chainrule states we calculate the gradient this way for the below neural network:

the question is if we calculate the gradient after passing all samples in the batch , then we have got different y1-hat for all the different inputs , which y1-hat is used to calculate the gradient? i'm confused

Topic backpropagation loss-function gradient-descent neural-network

Category Data Science

calculating gradient descent

About