Why linear model cannot understand the interaction between any two input features?

Question

Why linear model cannot understand the interaction between any two input features?

Gull Noor

2022年2月7日 17:38

The book Deep Learning by Ian Goodfellow states that:

Linear models also have the obvious defect that the model capacity is limited to 
linear functions, so the model cannot understand the interaction between any two 
input variables.

What is meant by interaction between variables
How do non linear models find it?

Would be great if someone can give an intuitive/graphical/geometrical explanation.

Topic linear-algebra deep-learning

Category Data Science

bogovicj · Accepted Answer · 2022年2月7日 17:38

Imagine our model has two inputs X1, X2 and one output Y Our input variables "interact" if, for example, the effect that X1 has on Y depends on the value of X2.

The simplest way to model an interaction term is by adding a term that has X1 * X2, for example:

Y = a*X1 + b*X2 + c*X1*X2 + d

The model above is non-linear in its inputs X1,X2: this is the kind of linearity Goodfellow is talking about. Note that it is linear in its parameters a,,b,c though, so you will still see this problem called "linear regression."

Example

Let's look at a specific example: wikipedia's cookie baking data. Our inputs are Temperature and time (in the oven). Our output is cookie Yield.

Increasing Temperature increases cookie yield when time is short.
Increasing Temperature decreases cookie yield when time is long.

Therefore the Temperature and time variables interact.

The interaction term

What does the interaction term (X1*X2) in the model do?

You can think of it as varying or interpolating between two simpler, one-variable models, using a second variable. If our two simple models are:

Model A: Y = a0 + a1 * Temperature
Model B: Y = b0 + b1 * Temperature

Then our full model, with interaction is:

Y = c_0 + c_a * time * (Model A) + c_b * time (Model B)

Try simplifying it, and you'll see that you get four terms that look like the first model we wrote down:

Y = a*X1 + b*X2 + c*X1*X2 + d

Why linear model cannot understand the interaction between any two input features?

Example

The interaction term

About