Where are the 60 million params of AlexNet?

Question

Where are the 60 million params of AlexNet?

Begoodpy

2021年2月12日 20:53

On the abstract of the AlexNet paper, they claimed to have 60 million parameters:

The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

When I implement the model with Keras, I get ~25 million params.

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(96, 11, strides=4, activation=relu, input_shape=[227,227,3]),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(256, 5, activation=relu, padding=SAME),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(384, 3, activation=relu, padding=SAME),
    tf.keras.layers.Conv2D(384, 3, activation=relu, padding=SAME),
    tf.keras.layers.Conv2D(256, 3, activation=relu, padding=SAME),
    tf.keras.layers.Dense(4096, activation=relu),
    tf.keras.layers.Dense(4096, activation=relu),
    tf.keras.layers.Dense(1000, activation=softmax),
])

Note that I removed the normalization and set an input of 227*227 instead of 224*224. See this question for details.

Here is the summary from Keras:

Model: sequential
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 256)       884992    
_________________________________________________________________
dense (Dense)                (None, 13, 13, 4096)      1052672   
_________________________________________________________________
dense_1 (Dense)              (None, 13, 13, 4096)      16781312  
_________________________________________________________________
dense_2 (Dense)              (None, 13, 13, 1000)      4097000   
=================================================================
Total params: 25,678,184
Trainable params: 25,678,184
Non-trainable params: 0
_________________________________________________________________

I'm really far from 60 million. So, how did they sum 60 million params?

For reference, here is the architecture of the model as described in Sec. 3.5 of the paper:

The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192, and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each.

Topic alex-net cnn keras convolutional-neural-network neural-network

Category Data Science

Begoodpy · Accepted Answer · 2020年10月26日 09:36

I forgot to flatten between the last Conv2D layer and the first fully-connected layer.

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(96, 11, strides=4, activation="relu", input_shape=[227,227,3]),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(256, 5, activation="relu", padding="SAME"),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(256, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Flatten(), # <-- This layer
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(1000, activation="softmax"),
])

Once added, I get the 62 million params:

Model: "alex_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              multiple                  34944     
_________________________________________________________________
conv2d_1 (Conv2D)            multiple                  614656    
_________________________________________________________________
conv2d_2 (Conv2D)            multiple                  885120    
_________________________________________________________________
conv2d_3 (Conv2D)            multiple                  1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            multiple                  884992    
_________________________________________________________________
max_pooling2d (MaxPooling2D) multiple                  0         
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  37752832  
_________________________________________________________________
dense_1 (Dense)              multiple                  16781312  
_________________________________________________________________
dense_2 (Dense)              multiple                  4097000   
=================================================================
Total params: 62,378,344
Trainable params: 62,378,344
Non-trainable params: 0
_________________________________________________________________

Even if it's a mistake from me, I leave it here for understanding purposes.

Where are the 60 million params of AlexNet?

About