How to do backward features elimination when considering interactions between them

Question

How to do backward features elimination when considering interactions between them

Oussama Jabri

2021年7月13日 04:03

I have a multi linear regression problem,

$Y$ is my target and $X_1, X_2, X_3$ are my features.

In my regression, I consider the interaction between $X_1, X_2, X_3$ and I add a bias.

So my problem is given by : $Y \sim X_1 + X_2 + X_3 + X_1X_2 + X_1X_3+ X_2X_3+ bias$

Now, I fit my model with statsmodels.api.sm and I want to eliminate the feature the highest p value recursively.

My first question is : for example, if the highest p value is for the $X_1X_2$ feature, is it okay to eliminate this feature even when $X_1$ and $X_2$ can be statistically significant ?
My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?

Thank you for your help

Topic statsmodels linear-regression feature-selection

Category Data Science

Carlos Mougan · Accepted Answer · 2020年2月18日 10:55

My first question is : for example, if the highest p value is for the X1X2 feature, is it okay to eliminate this feature even when X1 and X2 can be statistically significant ?

Of course, the interaction can have no information about the target. Per example if the problem is perfectly defined by X1 and X2. The interaction $X_1 \cdot X_2$ won't add nothing to the model.

My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?

I would try a more experimental approach of removing them only if they don't improve the model accuracy rather than having a low P-Value.

As a further reccomendation I would reccomend sklearn.

How to do backward features elimination when considering interactions between them

About