Backpropagation with step or threshold activation function
I understand that gradient descent is local and it deals only with the inputs to the neuron, what it outputs and what it should output. In all I've seen, gradient descent needs the activation function to be differentiable, so a threshold function cannot be used.
Yet, the biological neurons either fire or they don't. The input to the neuron, in my understanding, is the equivalent of the membrane potential. Once it passes a certain threshold, the neuron fires (one or multiple times) and the input is reset. Considering this, the step function appears to be enough to reproduce the behavior of biological neurons.
I'm thinking about the usual weights (integer or even float) paired with the step function, trained with the following form of backpropagation:
looking at one neuron and its connections and backpropagating, it has an output of 1 (it fired) when it should have been 0. The neuron has a threshold of 5 and its net input did sum up to be higher than 5.
to train it, subtract from the weights so that the net input decreases by an arbitrary learning rate, so that the threshold is not exceeded or is closer to not being exceeded. For example, the net input was 5.5, so the "error" is 0.5. Multiply this "error" by the learning rate of 0.1 and the result is 0.05. This means that the weights need to be adjusted by 5% (multiplied times 0.95). The weights are changed only a little for each training example.
In my understanding, this is similar to backpropagation with gradient descent, but it's applied to step activation functions and it works even if they're not differentiable.
My question is, where are the implementations of this very simple training method? Why wouldn't it work?
Topic training backpropagation binary
Category Data Science