How to do backward features elimination when considering interactions between them
I have a multi linear regression problem,
$Y$ is my target and $X_1, X_2, X_3$ are my features.
In my regression, I consider the interaction between $X_1, X_2, X_3$ and I add a bias.
So my problem is given by : $Y \sim X_1 + X_2 + X_3 + X_1X_2 + X_1X_3+ X_2X_3+ bias$
Now, I fit my model with statsmodels.api.sm
and I want to eliminate the feature the highest p value recursively.
- My first question is : for example, if the highest p value is for the $X_1X_2$ feature, is it okay to eliminate this feature even when $X_1$ and $X_2$ can be statistically significant ?
- My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?
Thank you for your help
Topic statsmodels linear-regression feature-selection
Category Data Science