Given a regression based model with many feature variables; what tools would you utilize to figure out which feature variables add the most variance?

Question

Given a regression based model with many feature variables; what tools would you utilize to figure out which feature variables add the most variance?

Sad CRUD Developer

2021年3月29日 23:04

Given a hypothetical dataset {S} with 100 X feature variables and 10 predicted Y variables.

X1	...	X100	Y1	....	Y10
1	..	2	3	..	4
4	..	3	2	..	1

Let's say I want to improve the accuracy of Y1. I am prepared to constraint/remove the input variables in order to increase the accuracy. How would I go about finding the culprits for making Y1 more variable than needed?

E.g. I find that X49 adds the biggest swing in variance with Y1 and after constraining it Y1 is fitted better.

How would I go about finding it's X49?

EDIT: I'm asking for approaches towards sensitivity analysis. Not deciding which variables need to be removed. Let's assume all 100 X variables are important but some need to be constrained (e.g. X49)

Topic multi-output variance regression dataset machine-learning

Category Data Science

Erwan · Accepted Answer · 2021年3月29日 23:04

There might be a smarter method but I would simply try to fit a model without $X_i$ for every feature $X_i$ (and also a reference model with all the features). By contrast the model where $X_{49}$ is removed should obtain the lowest variance if $X_{49}$ is responsible for a lot of variance.

Be careful that in general a feature which causes a lot of variance is an important one, since if it wasn't important then it wouldn't have much impact on the target.

Given a regression based model with many feature variables; what tools would you utilize to figure out which feature variables add the most variance?

About