Understanding the step of SGD for binary classification
I cannot understand the step of SGD for binary classification.
For example, we have $y$ - true labels $\in \{0,1\}$ and $p=f_\theta(x)$-predicted labels $\in [0,1]$.
Then, the update step of SGD is the following $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,f_\theta(x))}{\partial \Theta}$, where L - loss function. Then follows the replacement that I cannot understand $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,p)}{\partial p}| {\scriptscriptstyle p=f_\theta(x)} \frac{\partial f_\theta(x)}{\partial \Theta}$
Why do we need to take the derivate of $\partial p$? Why we haven't replaced $f_\theta(x)$ with $p$ in the last fraction?
Topic sgd derivation mathematics
Category Data Science