How gradients are flown back to Network in siamese architecture? How weights of all CNN models are same even when using different models

TL;DR: Intuition behind the gradient flow in Siamese Network? How can 3 models share the same weights? And if 1 model is used, how Gradients are updated from 3 different paths?

I am trying to build a Siamese Network and as far as I can know, if I have to build a Triplet Loss based Siamese, I have to use 3 different networks. So for simplicity, let us say that my architecture is something like: Please correct the architecture if wrong

    I1 = Input(shape=image_shape)
    I2 = Input(shape=image_shape)
    I3 = Input(shape=image_shape)

    res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
    res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
    res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I3, pooling='avg')

    x1 = res_m_1.output
    x2 = res_m_2.output
    x3 = res_m_3.output

    # x = Flatten()(x) or use this one if not using any pooling layer

    ##### ------- ---------------------------- --------- ########

    'NEED HELP AFTER THIS ONE; HOW TO BUILD ARCHITECTURE'
    
    ########### ------------------------------------ ###########

    siamese_model = Model(inputs=[I1,I2], outputs=final_output)
    
 
    siamese_model.compile(loss=some_triplet_loss,optimizer=Adam(),metrics['acc'])

    siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)

My Understanding and Question:: If there are 3 networks in the architecture, how come they can produce the output with the same weights? How come these networks are sharing weights?

Also, let us suppose it is just one network (unable to assume, how, please help), then at the first epoch, it'll give outputs with default weights (if used ImageNet). But when the gradients flow back to the network, how are these updated? Because there are 3 different paths going from the same model and how the gradients will flow back to these paths? Parallelly is not possible (I can't think how) and if sequentially, how that one either because outputs were provided sequentially but gradients can't flow back that way?

Topic siamese-networks cnn deep-learning neural-network machine-learning

Category Data Science


You do three forward passes for the three inputs and calculate one loss. So some modules (maybe all) are used three times. As the gradients depend on the inputs, three gradients get calculated and accumulated!

Source


You only create ONE model for a siamese network that you pass your inputs to.

(You have created three models in the example)

So in your triplet case you would pass the three inputs seperatly to the network and compute a loss and backprop it.

The gradients are then updated just like training any neural net.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.