Interpreting the variance of feature importance outputs with each random forest run using the same parameters

I noticed that I am getting different feature importance results with each random forest run even though they are using the same parameters. Now, I know that a random forest model takes observations randomly which is causing the importance levels to vary. This is especially shown for the less important variables.

My question is how does one interpret the variance in random forest results when running it multiple times? I know that one can reduce the instability level of results by increasing the number of trees; however, this doesn't really tell me if my feature importance results are true though they may be true for that specific run (but not necessarily for a separate run).

Even if I were to take an extremely large number of trees and average the feature importance results for each variable, that still doesn't necessarily confirm that it will produce the same importance results if I repeat that exact same process again.

Additionally, I have tried it with an extremely large number of trees and still got a slight variation (it did significantly reduce the variance of my results) in my feature importance results between runs.

Is there any method that I can use to interpret this variance of importance between runs?

I cannot set a seed because I need stable (similar) results across different seeds.

Any help at all would be greatly appreciated!

Topic feature-importances predictor-importance random-forest machine-learning

Category Data Science


Random Forests are full of 'randomness', from selecting and resampling the actual data (bootstrapping) to selection of the best features that go into the individual decision trees. So with all of this sampling going on the starting seed will affect all of these intermediate results as well as the final set of trees. Since you asked about the feature importance it will also affect the ranking as well. So it is always best to keep the seed the same.

If you results are changing, and you are doing multiple runs, averaging the feature importance of all of the runs should give you a good idea of what the 'true' value should be.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.