Why we use an activation function for introducing nonlinearity instead of a polynomial Perceptron implementation?

I perceive a single perceptron as a single linear function $y = a_1x_1 + a_2x_2 + ... + a_nx_n + b_0$ with a goal to calculate the best weights combination $ w_1, w_2, ..., w_n $ that minimizes the given loss function.

The problem with this type of network is that it would not be able to perform well on a non linear dataset, thus an activation function would be used in order to tackle this. I am wandering what could happen if instead of a linear Perceptron, we introduce a polynomial Perceptron in the form of $ y = a_1x_1^k + a_2x_2^r + ...+ a_nx_n + b_0$ and how this will compare with the original perceptron.

Topic mlp perceptron deep-learning

Category Data Science


Well, thanks to the universal approximation theorem from a purely theoretical point of view, absolutely nothing.

The main issue is with computation. You can find more information here. Mainly, you want functions easy to calculate (polynomials are ok) but with specific regions where derivatives are monotonic (here polynomials are not good) and approximating the identity near the origin (again, polynomials not good).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.