How is the output of a maxpool layer window size=1x2 and stride=2 calculated?
I'm looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.
In the proposed architecture of the model, a MaxPooling Window:1 × 2, s:2 layer is mentioned. I'm not sure what the size of the output of this layer would be.
If i have an input of size (32 x 8), then the output would be:
(32-1)/2 + 1 = 16.5, - this part doesn't make sense to me
(8-2)/2 + 1 = 4
*ignoring depth and batch size here
Topic pooling cnn neural-network
Category Data Science