What exactly is activity sparsity and why is it beneficial?

I have been reading about weight sparsity and activity sparsity with regard to convolutional neural networks. Weight sparsity I understood as having more trainable weights being exactly zero, which would essentially mean having less connections, allowing for a smaller memory footprint and quicker inference on test data. Additionally, it would help against overfitting (which I understand in terms of smaller weights leading to simpler models/Ockham's razor). From what I understand now, activity sparsity is analogous in that it would lead to less non-zero activations. I find it difficult to see what this exactly means, and how having less non-zero activations would help against overfitting.

Specifically I have been doing some initial regularisation searches in training a CNN on handwritten digit data, and I encountered good performance with activity regularisation (activity_regularizer in Keras/Tensorflow). Before I actually use this I'd like to understand what I'm implementing.

Topic sparse sparsity cnn regularization

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.