Why is it valid to remove a constant factor from the derivative of an error function?
I was reading the book 'Make your own neural network' by Tariq Rashid. In his book, he said:
(Note - He's talking about normal feed forward neural networks)
The $t_k$ is the target value at node $k$, the $O_k$ is the predicted output at node $k$, $W_{jk}$ is the weight connecting the node $j$ and $k$ and the $E$ is the error at node $k$
Then he says that, we can remove the 2 because we only care about the direction of the slope of the error function and it's just a scaling factor. So, can't we remove $sigmoid($$\sum_{j}$$ W_{jk}. O_j)$, as we know it would be between $0$ and $1$, and so it would also just act as a scaling factor. If you then see, we can remove everything after $(t_k-O_k)$, as we know the whole expression would be between $0$ and $1$, and so it would just act as a scaling factor. So that leaves us with just:
$$t_k-O_k$$
which is definitely the wrong derivative.
If we can't remove that whole expression, then why did he removed the $2$, as they both were scaling factors?
Topic derivation deep-learning neural-network machine-learning
Category Data Science