Why not use constant instead of permutation for model agnostic predictor importance?

I want to determine predictor importance. Ideal is to re-train same model on same dataset missing each variable in turn. This is too time consuming. The recommendation I have seen everywhere is to "remove" the column by converting it into noise by replacing it with its permutation. Why is it not better to replace the variable with a constant, thus "muting" the signal?

I ran an experiment on my own natural dataset with highly cross-correlated variables removed. Variable importance was computed using the constants 0, mean median and values common in variable. I used all of caret's regression models. Loss function is Pearson's correlation between supervised target variable and model prediction with one "muted" variable using different constants or permutation. Data with no error (variable did not matter one bit) was removed. All center constants produce less of a drop in correlation than permutation produced about 900/1400 of the time. 531/1421 constants did not make same error therefore interaction and conditionality. At least all decision trees would do this. If variable is part of polynomial as an additive term, replacing it with any constant would produce same correlation.

From ESLII, in a random forest," The randomization effectively voids the effect of a variable, much like setting a coefficient to zero in a linear model (Exercise 15.7)." Or the variable to it's mean.

Take all available values in a variable and use them as the muting constant. Weigh by constant's occurrence. The mean of that is what permuting converges to as more iterations are run. Right?

I can not think of what to try next to understand this problem.

Topic predictor-importance feature-selection

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.