Backpropagation with log likelihood cost function and softmax activation
In the online book on neural networks by Michael Nielsen, in chapter 3, he introduces a new cost function called as log-likelihood function defined as below
$$ C = -ln(a_y^L) $$
Suppose we have 10 output neurons, when back propagating the error, only the gradient w.r.t. $y^{th}$ output neuron is non-zero and all others are zero. Is that right?
If so, how is the below equation (81) true? $$\frac{\partial C}{\partial b_j^L} = a_j^L - y_j $$ I'm getting the expression as $$\frac{\partial C}{\partial b_j^L} = y_j (a_j^L - 1) $$
Topic softmax backpropagation neural-network
Category Data Science