AlexNet Research Paper VS PytTorch and Tensorflow implementation

I'm making my way through Deep Learning research papers, starting with AlexNet, and I found differences in the implementation of PyTorch and Tensorflow that I can't explain.

In the research paper, they define the model architecture with:

The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map).

The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x.

The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers.

The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer.

The fourth convolutional layer has 384 kernels of size 3x3x192

The fifth convolutional layer has 256 kernels of size 3x3x192.

Note: Fully Connected layers are not inlcuded

Here is the PyTorch implementation of the model:

# 1st Layer
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),

# 2nd Layer
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),

# 3rd Layer
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),

# 4th Layer
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),

# 5th Layer
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2)

And the Tensorflow implementation:

# 1st layer
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')

# 2nd layer
net = slim.conv2d(net, 192, [5, 5], scope='conv2')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')

# 3rd layer
net = slim.conv2d(net, 384, [3, 3], scope='conv3')

# 4th layer
net = slim.conv2d(net, 384, [3, 3], scope='conv4')

# 5th layer
net = slim.conv2d(net, 256, [3, 3], scope='conv5')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

Note: Tensorflow applies ReLU activation to every layer

Why both implementations have a first layer with a kernel of 64 where the paper mentions a kernel of 92?

Why both implementations have a second layer with a kernel of 192 where the paper mentions a kernel of 256?

[PyTorch] Why is there a padding of 2 on layers 1 and 2?

[PyTorch] Why is there a kernel of 256 for the 4th layer where it should be a kernel of size 384

Topic alex-net pytorch tensorflow deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.