Imagine our model has two inputs X1, X2 and one output Y Our input variables "interact" if, for example, the effect that X1 has on Y depends on the value of X2.
The simplest way to model an interaction term is by adding a term that has X1 * X2, for example:
Y = a*X1 + b*X2 + c*X1*X2 + d
The model above is non-linear in its inputs X1,X2: this is the kind of linearity Goodfellow is talking about. Note that it is linear in its parameters a,,b,c though, so you will still see this problem called "linear regression."
Example
Let's look at a specific example: wikipedia's cookie baking data. Our inputs are Temperature and time (in the oven). Our output is cookie Yield.
- Increasing
Temperature increases cookie yield when time is short.
- Increasing
Temperature decreases cookie yield when time is long.
Therefore the Temperature and time variables interact.
The interaction term
What does the interaction term (X1*X2) in the model do?
You can think of it as varying or interpolating between two simpler, one-variable models, using a second variable. If our two simple models are:
- Model A:
Y = a0 + a1 * Temperature
- Model B:
Y = b0 + b1 * Temperature
Then our full model, with interaction is:
Y = c_0 + c_a * time * (Model A) + c_b * time (Model B)
Try simplifying it, and you'll see that you get four terms that look like the first model we wrote down:
Y = a*X1 + b*X2 + c*X1*X2 + d
