How gradients are flown back to Network in siamese architecture? How weights of all CNN models are same even when using different models
TL;DR: Intuition behind the gradient flow in Siamese Network? How can 3 models share the same weights? And if 1 model is used, how Gradients are updated from 3 different paths?
I am trying to build a Siamese Network and as far as I can know, if I have to build a Triplet Loss
based Siamese
, I have to use 3 different networks. So for simplicity, let us say that my architecture is something like: Please correct the architecture if wrong
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
I3 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I3, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
x3 = res_m_3.output
# x = Flatten()(x) or use this one if not using any pooling layer
##### ------- ---------------------------- --------- ########
'NEED HELP AFTER THIS ONE; HOW TO BUILD ARCHITECTURE'
########### ------------------------------------ ###########
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
siamese_model.compile(loss=some_triplet_loss,optimizer=Adam(),metrics['acc'])
siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)
My Understanding and Question:: If there are 3 networks in the architecture, how come they can produce the output with the same weights? How come these networks are sharing weights?
Also, let us suppose it is just one network (unable to assume, how, please help), then at the first epoch, it'll give outputs with default weights (if used ImageNet). But when the gradients flow back to the network, how are these updated? Because there are 3 different paths going from the same model and how the gradients will flow back to these paths? Parallelly is not possible (I can't think how) and if sequentially, how that one either because outputs were provided sequentially but gradients can't flow back that way?
Topic siamese-networks cnn deep-learning neural-network machine-learning
Category Data Science