Understanding nn.Conv2D in pytorch

Question

Understanding nn.Conv2D in pytorch

user8469759

2022年5月22日 08:50

I am trying to learn the basic of pytorch so I can assemble my own CNN's. One thing I am also trying to learn is navigating the API documentation.

Specifically at the moment I am trying to read through nn.Conv2d. I quote the documentation:

Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size $(N,C_{in},H,W)$ and output $(N,C_{out},H_{out},W_{out})$ can be precisely described as $$ \text{out}(N_i,C_{out_j}) = \text{bias}(C_{out_j}) + \sum_{k=0}^{C_{in}-1} \text{weight}(C_{out_j},k) * \text{input}(N_i,k) $$ where $*$ is the valid 2D cross-correlation operator, $N$ is a batch size, $C$ denotes the number of channels, $H$ is a height of the input planes in pixels and $W$ is the width in pixels.

Further down in the documentation I see also the following

Input : $(N,C_{in},H_{in},W_{in})$ or $(C_{in},H_{in},W_{in})$

Output: $(N,C_{out},H_{out},W_{out})$ or $(C_{out},H_{out},W_{out})$

$$ H_{out} = \left\lfloor \frac{H_{in} + 2\times \text{padding}[0] - \text{dilation}[0]\times(\text{kernel_size}[0] - 1)}{\text{stride}[0]} - 1 \right\rfloor $$ $$ W_{out} = \left\lfloor \frac{W_{in} + 2\times \text{padding}[1] - \text{dilation}[1]\times(\text{kernel_size}[1] - 1)}{\text{stride}[1]} - 1 \right\rfloor $$

I am struggling to interpret the equation calculating $\text{out}(N_i,C_{out_j})$. Also $N$ is defined as batch size, but I am not 100 sure what it means in the context of the equation. Also input planes is not clear as well what it means. $N_i, C_{out_j}$ also are not defined, they might look like indices to me, but I am not sure.

Question: Can someone explain to me please how to read the equation above? what are in this context the batch size and input planes?

Topic pytorch programming deep-learning

Category Data Science

Understanding nn.Conv2D in pytorch

About