Spatial Transformer Networks: how is theta differentiable?

Question

Spatial Transformer Networks: how is theta differentiable?

Keven Wang

2018年9月13日 07:25

In the paper Spatial Transformer Networks, the localization network's output, theta, is differentiable, given the current input feature map. How is this theta differentiable?

Topic spatial-transformer cnn deep-learning neural-network machine-learning

Category Data Science

Green Falcon · Accepted Answer · 2018年9月13日 07:25

In spatial transformer networks, basically, the concept of localisation network is to learn to apply a transformation to find the canonical form of the input. Imagine the output of the network $\theta$ as an activation which is passed to another layer. The point is that the sampling sequence of operations is differentiable. $\theta$ is just an output which specifies how the sampling should be performed. The sampling operation that is usually used is bilinear interpolation which although is not differentiable at all points due to the floor and ceiling functions, it can backpropagate the error and is differentiable in most of its inputs. Consider the $\theta$ just as activation which is passed to the bilinear sampler for changing the input of the next network. bilinear sampling is considered to be differentiable.

To understand it better, consider the following figure which illustrates the process inside a spatial transformer easier than the one in the original paper.

As it is clear, the output of the localisation network which is $\theta$ will be passed to the sampling grid. Sampling gird will be multiplied to the $\theta$ to find appropriate regions in the original image. Consider that you don't multiply $\theta$ to the original image. The reason is that if you multiply by the original image, there will be multiple choices for a single pixel while if you multiply the output of the localisation network by the sampling grid, for each entry there is just a single choice. Next, the sampled grid and the original image will be used in the interpolation to find the transformed image. As it is clear, $\theta$ is like the other activations.

Spatial Transformer Networks: how is theta differentiable?

About