Compare rate of change for multiple object/weights
For a Neural Network, the weight update equation is:
However, there are millions of such weights W_i. If I am interested in capturing how much each weight/connection W_i is changing as compared to other weights, I am using the absolute magnitude of gradient summation for each weight W_i:
where you are summing the absolute magnitude of gradients for the entirety of 'k' training iterations. number of training iterations (k) = train dataset size / batch size.
After computing this summation for each weight W_i, I compare this summation and am then able to filter connections/weights which haven't changed a lot during training.
Is there any other better way to capture this rate of change? I thought about Exponentially Moving Average but it gives more importance to the recent values as compared to older values. Whereas the summation above captures all values.
Thanks!
Topic mini-batch-gradient-descent gradient-descent neural-network
Category Data Science