Why we use an activation function for introducing nonlinearity instead of a polynomial Perceptron implementation?

Question

Why we use an activation function for introducing nonlinearity instead of a polynomial Perceptron implementation?

geo199

2021年4月9日 14:35

I perceive a single perceptron as a single linear function $y = a_1x_1 + a_2x_2 + ... + a_nx_n + b_0$ with a goal to calculate the best weights combination $ w_1, w_2, ..., w_n $ that minimizes the given loss function.

The problem with this type of network is that it would not be able to perform well on a non linear dataset, thus an activation function would be used in order to tackle this. I am wandering what could happen if instead of a linear Perceptron, we introduce a polynomial Perceptron in the form of $ y = a_1x_1^k + a_2x_2^r + ...+ a_nx_n + b_0$ and how this will compare with the original perceptron.

Topic mlp perceptron deep-learning

Category Data Science

Oscar · Accepted Answer · 2021年4月9日 14:35

Well, thanks to the universal approximation theorem from a purely theoretical point of view, absolutely nothing.

The main issue is with computation. You can find more information here. Mainly, you want functions easy to calculate (polynomials are ok) but with specific regions where derivatives are monotonic (here polynomials are not good) and approximating the identity near the origin (again, polynomials not good).

Why we use an activation function for introducing nonlinearity instead of a polynomial Perceptron implementation?

About