Proof of Correctness of Perceptron Training Rule
The Perceptron Training Rule is basically applying Stochastic Gradient Descent for finding the coefficients of a hyperplane (which works as a Decision Boundary) for doing binary classification of data points (instances).
I read that the Stochastic Gradient Descent algorithm could be proved to work accurately for finding the coefficients (aka weights) of a hyperplane Decision Boundary, given the following:
- Provided the training examples are linearly separable.
- Provided a sufficiently small Learning Rate is used.
Could anyone please prove the above?
Topic perceptron neural-network machine-learning
Category Data Science