back propagation through time derivation issue

I read several posts about BPTT for RNN, but I am actually a bit confused about one step in the derivation. Given

$$h_t=f(b+Wh_{t-1}+Ux_t)$$

when we compute $\frac{\partial h_t}{\partial W}$, does anyone know why is it simply

$$\frac{\partial h_t}{\partial W}=\frac{\partial h_{t}}{\partial h_{t-1}}\frac{\partial h_{t-1}}{\partial W}$$

not

$$\frac{\partial h_t}{\partial W}=\frac{\partial h_{t}}{\partial h_{t-1}}\frac{\partial h_{t-1}}{\partial W}+\frac{\partial h_t}{\partial f}h_{t-1}$$

?

What I mean is, since both $W$ and $h_{t-1}$ depends on $W$, why is the second term in the expression above missing?

Thank you!

Topic derivation rnn deep-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.