What issue is there, when training this network with gradient descent?
Suppose we have the following fully connected network made of perceptrons with a sign function as the activation unit, what issue arises, when trying to train this network with gradient descent?
Topic perceptron backpropagation gradient-descent
Category Data Science