Imagine our model has two inputs X1
, X2
and one output Y
Our input variables "interact" if, for example, the effect that X1
has on Y
depends on the value of X2
.
The simplest way to model an interaction term is by adding a term that has X1 * X2
, for example:
Y = a*X1 + b*X2 + c*X1*X2 + d
The model above is non-linear in its inputs X1
,X2
: this is the kind of linearity Goodfellow is talking about. Note that it is linear in its parameters a
,,b
,c
though, so you will still see this problem called "linear regression."
Example
Let's look at a specific example: wikipedia's cookie baking data. Our inputs are Temperature
and time
(in the oven). Our output is cookie Yield
.
- Increasing
Temperature
increases cookie yield when time
is short.
- Increasing
Temperature
decreases cookie yield when time
is long.
Therefore the Temperature
and time
variables interact.
The interaction term
What does the interaction term (X1*X2
) in the model do?
You can think of it as varying or interpolating between two simpler, one-variable models, using a second variable. If our two simple models are:
- Model A:
Y = a0 + a1 * Temperature
- Model B:
Y = b0 + b1 * Temperature
Then our full model, with interaction is:
Y = c_0 + c_a * time * (Model A) + c_b * time (Model B)
Try simplifying it, and you'll see that you get four terms that look like the first model we wrote down:
Y = a*X1 + b*X2 + c*X1*X2 + d
![enter image description here](https://i.stack.imgur.com/RGnoz.png)