Feature importance by removing all other features?
For neural network feature importance, can I zero-out all features except one in order to gauge that feature's importance? I know shuffling a feature is one approach.
For example, leaving in the 4th feature.
feature_4 = [
[0.,0.,0.,1.15,0.]
[0.,0.,0.,1.76,0.]
[0.,0.,0.,2.31,0.]
[0.,0.,0.,0.94,0.]
]
_, probabilities = model.predict(feature_4)
The non-linear output of activation functions worries me because activation of the whole is not equal to the sum of individual activations:
from scipy.special import expit #aka sigmoid
expit(2.0)
0.8807970779778823
expit(1.0)+expit(1.0)
1.4621171572600098
And softmax seems much less straightforward in comparison to sigmoid.