Given a regression based model with many feature variables; what tools would you utilize to figure out which feature variables add the most variance?
Given a hypothetical dataset {S} with 100 X feature variables and 10 predicted Y variables.
X1 | ... | X100 | Y1 | .... | Y10 |
---|---|---|---|---|---|
1 | .. | 2 | 3 | .. | 4 |
4 | .. | 3 | 2 | .. | 1 |
Let's say I want to improve the accuracy of Y1. I am prepared to constraint/remove the input variables in order to increase the accuracy. How would I go about finding the culprits for making Y1 more variable than needed?
E.g. I find that X49 adds the biggest swing in variance with Y1 and after constraining it Y1 is fitted better.
How would I go about finding it's X49?
EDIT: I'm asking for approaches towards sensitivity analysis. Not deciding which variables need to be removed. Let's assume all 100 X variables are important but some need to be constrained (e.g. X49)
Topic multi-output variance regression dataset machine-learning
Category Data Science