How does resnet model restores the skipped layers as it learns the feature space?
From the definition of resent from wikipedia: it is mentioned that resent model uses fewer layers in the initial training stages. This speeds learning by reducing the impact of vanishing gradients, as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.
I am new to this domain, and have tried using resent50 and resnet18 for toy examples, but I am not sure about these theoretically parts. Can someone help to answer or give directions to study material for below queries:
How does resnet model restores the skipped layers as it learns the feature space?
it seems like the certain layers are skipped due to problem of vanishing gradients, but why does not the adjacent layers (who's activation is used while skipping certain layers) face the same issue?
Topic machine-learning-model
Category Data Science