Given a regression based model with many feature variables; what tools would you utilize to figure out which feature variables add the most variance?

Given a hypothetical dataset {S} with 100 X feature variables and 10 predicted Y variables.

X1 ... X100 Y1 .... Y10
1 .. 2 3 .. 4
4 .. 3 2 .. 1

Let's say I want to improve the accuracy of Y1. I am prepared to constraint/remove the input variables in order to increase the accuracy. How would I go about finding the culprits for making Y1 more variable than needed?

E.g. I find that X49 adds the biggest swing in variance with Y1 and after constraining it Y1 is fitted better.

How would I go about finding it's X49?

EDIT: I'm asking for approaches towards sensitivity analysis. Not deciding which variables need to be removed. Let's assume all 100 X variables are important but some need to be constrained (e.g. X49)

Topic multi-output variance regression dataset machine-learning

Category Data Science


There might be a smarter method but I would simply try to fit a model without $X_i$ for every feature $X_i$ (and also a reference model with all the features). By contrast the model where $X_{49}$ is removed should obtain the lowest variance if $X_{49}$ is responsible for a lot of variance.

Be careful that in general a feature which causes a lot of variance is an important one, since if it wasn't important then it wouldn't have much impact on the target.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.