Logic behind pre-trained weights and transfer learning
I am not sure about the logic behind, how pre-trained weights actually make sense and translate into a new problem.
To be more specific; for example in a object detection network, how would a model's weights that were trained, let's say, on the COCO dataset, with 80 categories, would translate into my new problem that only has 2 categories (classes). How does this make sense? What kind of meaningful features could even be transferred from the previously pre-trained model to my new problem, since the number of categories (classes) have been changed, and also I'm trying to detect completely different types of objects than the previous model? Why do we do transfer learning then?