Understanding the step of SGD for binary classification

I cannot understand the step of SGD for binary classification.

For example, we have $y$ - true labels $\in \{0,1\}$ and $p=f_\theta(x)$-predicted labels $\in [0,1]$.

Then, the update step of SGD is the following $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,f_\theta(x))}{\partial \Theta}$, where L - loss function. Then follows the replacement that I cannot understand $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,p)}{\partial p}| {\scriptscriptstyle p=f_\theta(x)} \frac{\partial f_\theta(x)}{\partial \Theta}$

Why do we need to take the derivate of $\partial p$? Why we haven't replaced $f_\theta(x)$ with $p$ in the last fraction?

Topic sgd derivation mathematics

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.