Understanding nn.Conv2D in pytorch
I am trying to learn the basic of pytorch so I can assemble my own CNN's. One thing I am also trying to learn is navigating the API documentation.
Specifically at the moment I am trying to read through nn.Conv2d. I quote the documentation:
Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size $(N,C_{in},H,W)$ and output $(N,C_{out},H_{out},W_{out})$ can be precisely described as $$ \text{out}(N_i,C_{out_j}) = \text{bias}(C_{out_j}) + \sum_{k=0}^{C_{in}-1} \text{weight}(C_{out_j},k) * \text{input}(N_i,k) $$ where $*$ is the valid 2D cross-correlation operator, $N$ is a batch size, $C$ denotes the number of channels, $H$ is a height of the input planes in pixels and $W$ is the width in pixels.
Further down in the documentation I see also the following
- Input : $(N,C_{in},H_{in},W_{in})$ or $(C_{in},H_{in},W_{in})$
- Output: $(N,C_{out},H_{out},W_{out})$ or $(C_{out},H_{out},W_{out})$
$$ H_{out} = \left\lfloor \frac{H_{in} + 2\times \text{padding}[0] - \text{dilation}[0]\times(\text{kernel_size}[0] - 1)}{\text{stride}[0]} - 1 \right\rfloor $$ $$ W_{out} = \left\lfloor \frac{W_{in} + 2\times \text{padding}[1] - \text{dilation}[1]\times(\text{kernel_size}[1] - 1)}{\text{stride}[1]} - 1 \right\rfloor $$
I am struggling to interpret the equation calculating $\text{out}(N_i,C_{out_j})$. Also $N$ is defined as batch size, but I am not 100 sure what it means in the context of the equation. Also input planes is not clear as well what it means. $N_i, C_{out_j}$ also are not defined, they might look like indices to me, but I am not sure.
Question: Can someone explain to me please how to read the equation above? what are in this context the batch size and input planes?
Topic pytorch programming deep-learning
Category Data Science