What exactly is activity sparsity and why is it beneficial?
I have been reading about weight sparsity and activity sparsity with regard to convolutional neural networks. Weight sparsity I understood as having more trainable weights being exactly zero, which would essentially mean having less connections, allowing for a smaller memory footprint and quicker inference on test data. Additionally, it would help against overfitting (which I understand in terms of smaller weights leading to simpler models/Ockham's razor). From what I understand now, activity sparsity is analogous in that it would lead to less non-zero activations. I find it difficult to see what this exactly means, and how having less non-zero activations would help against overfitting.
Specifically I have been doing some initial regularisation searches in training a CNN on handwritten digit data, and I encountered good performance with activity regularisation (activity_regularizer
in Keras/Tensorflow). Before I actually use this I'd like to understand what I'm implementing.
Topic sparse sparsity cnn regularization
Category Data Science