How does resnet model restores the skipped layers as it learns the feature space?

Question

How does resnet model restores the skipped layers as it learns the feature space?

vinaygarg

2022年5月22日 13:02

From the definition of resent from wikipedia: it is mentioned that resent model uses fewer layers in the initial training stages. This speeds learning by reducing the impact of vanishing gradients, as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.

I am new to this domain, and have tried using resent50 and resnet18 for toy examples, but I am not sure about these theoretically parts. Can someone help to answer or give directions to study material for below queries:

How does resnet model restores the skipped layers as it learns the feature space?
it seems like the certain layers are skipped due to problem of vanishing gradients, but why does not the adjacent layers (who's activation is used while skipping certain layers) face the same issue?

Topic machine-learning-model

Category Data Science

Archit Garg · Accepted Answer · 2020年4月2日 17:31

In the architecture of resnet there are skip connections between certain blocks. Now, the significance of these skip connections is that during the initial training weights are not that significant and due to multiple hidden layers we face the problem of vanishing gradients. To deal with this researchers introduced residual connection which connects the output of the previous block directly to the output of the next block and thus information keeps on transmitting. Like, just see the architecture of the residual network:

  Ref: https://missinglink.ai/guides/pytorch/pytorch-resnet-building-training-scaling-residual-networks-pytorch/

In VGG-19 the information is directly being transferred from one hidden layer and in ResNet the output of one block is being transferred to next block's input as well as to the ouput of the next block, so that if we face the vanishing gradients problem due to the hidden layers of this block network can deal with it.

To your query of why does the activation is not skipped, the output of both previous and next block is concatenated and then passed through an activation layer, that's why it doesn't skips the activation layer.

How does resnet model restores the skipped layers as it learns the feature space?

About