Paper about AE-CNN is unclear. Deriving layers of dense blocks?
I am implementing the algorithm called Automatically Evolving CNN (AE-CNN).
Some things aren't specified which makes it a bit hard to understand what the paper actually means to say. In the chapter 3.2 Encoding Strategy it says:
...Note that the number of convolutional layers in a DB is known because it can be derived by the spatial sizes of input and output as well as k. ...
By DB it means a single dense block from the DenseNet algorithm. K means the K parameter for the dense block. This is the growth rate (how many layers each conv layer adds). Later on, the paper restricts the allowed values for K to just 12, 20, and 40.
I have two questions about this statement:
- Given the shape of the input- and output tensors and a growth rate, how would one derive the number of layers?
The block cannot change the width and height of the feature maps (except for not padding but this is not the case) nor can it change the depth of a tensor. The depth of the output tensor can only be equal to the chosen K (I believe that the other feature maps must be ignored after the last conv layer).
- The statement is only talking about single DBs but, later on, the paper says that it groups multiple DBs into units (DBU) and that the input- and output tensor shapes are only specified per units. This is contradictory. How should I interpret this? Does the statement makes sense if it would talk about DBUs instead?
Note: The paper isn't specific about it but I interpreted the paper to say that the depth of either the input tensor or output tensor (but not both) should be evolved.
Edit: A follow-up question: would it make sense to evolve this parameter (the number of layers in a DB or DBU)?