Actually, the answers above seem to be wrong. Indeed, it was a big mess with the naming. However, it seems that it was fixed in the paper that introduces Inception-v4 (see: "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning"):

The Inception deep convolutional architecture was introduced as GoogLeNet in (Szegedy et al. 2015a), here named Inception-v1. Later the Inception architecture was refined in various ways, first by the introduction of batch normalization (Ioffe and Szegedy 2015) (Inception-v2). Later by additional factorization ideas in the third iteration (Szegedy et al. 2015b) which will be referred to as Inception-v3 in this report.


In the paper Batch Normalization,Sergey et al,2015. proposed Inception-v1 architecture which is a variant of the GoogleNet in the paper Going deeper with convolutions, and in the meanwhile they introduced Batch Normalization to Inception(BN-Inception).

The main difference to the network described in (Szegedy et al.,2014) is that the 5x5 convolutional layers are replaced by two consecutive layer of 3x3 convolutions with up to 128 filters.

And in the paper Rethinking the Inception Architecture for Computer Vision, the authors proposed Inception-v2 and Inception-v3.

In the Inception-v2, they introduced Factorization(factorize convolutions into smaller convolutions) and some minor change into Inception-v1.

Note that we have factorized the traditional 7x7 convolution into three 3x3 convolutions

As for Inception-v3, it is a variant of Inception-v2 which adds BN-auxiliary.

BN auxiliary refers to the version in which the fully connected layer of the auxiliary classifier is also-normalized, not just convolutions. We are refering to the model [Inception-v2 + BN auxiliary] as Inception-v3.


beside what was mentioned by daoliker

inception v2 utilized separable convolution as first layer of depth 64

quote from paper

Our model employed separable convolution with depth multiplier 8 on the first convolutional layer. This reduces the computational cost while increasing the memory consumption at training time.

why this is important? because it was dropped in v3 and v4 and inception resnet, but re-introduced and heavily used in mobilenet later.


The answer can be found in the Going deeper with convolutions paper: https://arxiv.org/pdf/1512.00567v3.pdf

Check Table 3. Inception v2 is the architecture described in the Going deeper with convolutions paper. Inception v3 is the same architecture (minor changes) with different training algorithm (RMSprop, label smoothing regularizer, adding an auxiliary head with batch norm to improve training etc).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.