Why the number of neurons or convolutions chosen equal powers of two?

Question

Why the number of neurons or convolutions chosen equal powers of two?

Roosh

2021年11月29日 19:15

In the overwhelming number of works devoted to the neural networks, the authors suggest arhitechure in which each layer is a numbers of neurons is power of 2

what are the theoretical reasons(prerequisite) for this choice?

Topic information-theory neural-network machine-learning

Category Data Science

Guinther Kovalski · Accepted Answer · 2021年11月29日 19:15

The reason is hardware based. For Neural networks, and deep learning, matrix operations are the main computations and source of floating point operations (FLOPs). Single Instruction Multiple Data (SIMD) operations in CPUs happen in batch sizes, which are powers of 2.

And for GPUs:

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html

Memory allocated through the CUDA Runtime API, such as via cudaMalloc(), is guaranteed to be aligned to at least 256 bytes. Therefore, choosing sensible thread block sizes, such as multiples of the warp size (i.e., 32 on current GPUs), facilitates memory accesses by warps that are properly aligned. (Consider what would happen to the memory addresses accessed by the second, third, and subsequent thread blocks if the thread block size was not a multiple of warp size, for example.)

This means that any multiple of 32 will optimize the memory access, and thus, the processing speed, while you are using GPU.

Consider take a look if you are interested:

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf

Martin Thoma · Accepted Answer · 2017年9月10日 16:35

It is just an arbitrary choice. You have to choose one number and the order of magnitude matters, but not the exact value. Powers of two just feel natural.

If you don't think so: Evaluate it on a given architecture. Lower the number of neurons from a power of two to a smaller number. If the time increases, you've proven me wrong.

tony · Accepted Answer · 2017年1月20日 09:44

Deep Neural Networks are usually trained on GPUs to speed up training time. Using power of two for the network topology follows the same logic as using power of two for image textures in computer games.

The GPU can take advantage of optimizations related to efficiencies in working with powers of two. (see https://gamedev.stackexchange.com/questions/26187/why-are-textures-always-square-powers-of-two-what-if-they-arent)

Why the number of neurons or convolutions chosen equal powers of two?

About