Compare rate of change for multiple object/weights

For a Neural Network, the weight update equation is:

However, there are millions of such weights W_i. If I am interested in capturing how much each weight/connection W_i is changing as compared to other weights, I am using the absolute magnitude of gradient summation for each weight W_i:

where you are summing the absolute magnitude of gradients for the entirety of 'k' training iterations. number of training iterations (k) = train dataset size / batch size.

After computing this summation for each weight W_i, I compare this summation and am then able to filter connections/weights which haven't changed a lot during training.

Is there any other better way to capture this rate of change? I thought about Exponentially Moving Average but it gives more importance to the recent values as compared to older values. Whereas the summation above captures all values.

Thanks!

Topic mini-batch-gradient-descent gradient-descent neural-network

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.