Whether Interaction terms should be included in Linear Regression analysis?

I am working on a linear model with 6 independent variables and when thinking about including an interaction I got lost.

An interaction exists if the level of one independent variable is affected by another independent variable. Doesn't that therefore mean that if an interaction exists there may also be collinearity problems? Similarly, if the correlation is low between the two variables, then that should imply there is no interaction?

I hope my question makes sense and that someone can help me clear up this confusion. I am now confused how we can ever get to having an interaction between two independent variables ?

Topic collinearity linear-regression predictive-modeling

Category Data Science


Several mixed effects models are available for identification of interaction effects. However, you need to plan and implement a correct statistical design. Choosing an appropriate model in terms of fixed-effects or random effects assumption is crucial part of the process.


Just to expand on @Ankita Talwar answer and give some slightly more formal intuition you can write a linear model with to regressors and their interaction as follows: $$ y = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2$$ where $x_1 x_2$ is the interaction term. Now refactoring you can see that the interaction can be absorbed into the coefficient for $x1$ making it depend on $x_2$: $ y = w_0 + v_1(x_2) x_1 + w_2 x_2$, where now $v_1(x_2) = w_1 + w_2 x_2$ is a function that depends on $x_2$ (alternatively you can view the interaction as modifying $x_2$ coefficient). So, when you add an interaction term you allow the coefficient value of one variable to vary depending on the value of the other variable. This operation only touches the coefficients not the variables themselves (so it doesn't say anything about them being collinear).

I guess your question might come from the term $x_1 x_2$ in the linear model, noticing that this term depends on $x_1$ (or $x_2$). If that is the case, please notice that $x_1 x_2$ would be collinear with, let say, $x_1$ if $x_2$ were a constant and not a variable (and viceversa). When both are variables (and of course provided the original variables are not linearly related) the interaction term is not collinear with any of the two variables.

However you might encounter some collinearity if one variable has a much smaller scale than the other one, in which case the interaction is mainly driven by the variable with the larger scale. This problem can be solved by standardizing your variables (which is anyway a good idea in regression problems).

Hope this clarifies a bit.


Interaction effect refers to the fact when two or more independent variables together impact the dependent variable. Interaction of two independent variables to affect the outcome variable is not affected by any relation between those variables. We might have a situation where two variables independently might not significantly impact the outcome variable and might even be independent of each other but might together impact the dependent variable significantly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.