Yes, the spatial dimensions (height and width) are reduced: the input is 16x16, H1 is 8x8 and H2 is 4x4.
Also see the first paragraph in the architecture section:
Source
In modern terms you would say that they use a stride of 2. Which reduces the spatial dimensions accordingly.
EDIT (based on your comment)
The formula for the spatial output dimension $O$ of a (square shaped) convolutional layer is the following:
$$O = \frac{I - K + 2P}S + 1$$ with $I$ being the input size, $K$ being the kernel size, $P$ the padding and $S$ the stride. Now you might think that in your example $O = \frac{16 - 5 + 2*2}2 + 1 = 8.5$ (assuming $P=2$)
But take a closer look at how it actually plays out when the 5x5 kernel of layer H1 scans the 16x16 input image with a stride of 2:

As you can see from the light grey area the required and effective padding is actually not 2 on all sides. Instead for the width or height respectively it is 2 on one side and 1 on the other side, i.e. on average $(2+1)/2=1.5$.
And if you plug that into the equation to calculate the output size it gives: $O = \frac{16 - 5 + 2*1.5}2 + 1 = 8$. Accordingly the convolutional layer H1 will have spatial dimensions of 8x8.