Why we use an activation function for introducing nonlinearity instead of a polynomial Perceptron implementation?
I perceive a single perceptron as a single linear function $y = a_1x_1 + a_2x_2 + ... + a_nx_n + b_0$ with a goal to calculate the best weights combination $ w_1, w_2, ..., w_n $ that minimizes the given loss function.
The problem with this type of network is that it would not be able to perform well on a non linear dataset, thus an activation function would be used in order to tackle this. I am wandering what could happen if instead of a linear Perceptron, we introduce a polynomial Perceptron in the form of $ y = a_1x_1^k + a_2x_2^r + ...+ a_nx_n + b_0$ and how this will compare with the original perceptron.
Topic mlp perceptron deep-learning
Category Data Science