Where are the 60 million params of AlexNet?
On the abstract of the AlexNet paper, they claimed to have 60 million parameters:
The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
When I implement the model with Keras, I get ~25 million params.
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(96, 11, strides=4, activation=relu, input_shape=[227,227,3]),
tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
tf.keras.layers.Conv2D(256, 5, activation=relu, padding=SAME),
tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
tf.keras.layers.Conv2D(384, 3, activation=relu, padding=SAME),
tf.keras.layers.Conv2D(384, 3, activation=relu, padding=SAME),
tf.keras.layers.Conv2D(256, 3, activation=relu, padding=SAME),
tf.keras.layers.Dense(4096, activation=relu),
tf.keras.layers.Dense(4096, activation=relu),
tf.keras.layers.Dense(1000, activation=softmax),
])
Note that I removed the normalization and set an input of 227*227 instead of 224*224. See this question for details.
Here is the summary from Keras:
Model: sequential
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 55, 55, 96) 34944
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 27, 27, 256) 614656
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 13, 13, 384) 885120
_________________________________________________________________
conv2d_3 (Conv2D) (None, 13, 13, 384) 1327488
_________________________________________________________________
conv2d_4 (Conv2D) (None, 13, 13, 256) 884992
_________________________________________________________________
dense (Dense) (None, 13, 13, 4096) 1052672
_________________________________________________________________
dense_1 (Dense) (None, 13, 13, 4096) 16781312
_________________________________________________________________
dense_2 (Dense) (None, 13, 13, 1000) 4097000
=================================================================
Total params: 25,678,184
Trainable params: 25,678,184
Non-trainable params: 0
_________________________________________________________________
I'm really far from 60 million. So, how did they sum 60 million params?
For reference, here is the architecture of the model as described in Sec. 3.5 of the paper:
The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192, and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each.
Topic alex-net cnn keras convolutional-neural-network neural-network
Category Data Science