Why does Siamese neural networks use tied weights and how do they work?

Question

Why does Siamese neural networks use tied weights and how do they work?

Hampus Weslien

2022年2月23日 06:04

Reading this paper on one-shot learning "Siamese Neural Networks for One-shot Image Recognition" I was introduced to the idea of Siamese Neural Networks.

What I did not fully grasp was what they meant by this line:

This objective is combined with standard
backpropagation algorithm, where the gradient is additive
across the twin networks due to the tied weights.

Firstly, how exactly are they tied? Bellow, I believe I've provided the formula by which they compute the gradient. T is the epoch, $\mu_j$ is the momentum, $\lambda_j$ the regularization, $\eta_j$ the learning rate, $w_{kj}$ I believe to be the weight between neuron k and in one layer and j in another but correct me if I'm wrong.

\begin{equation}\begin{array}{c} \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)=\mathbf{w}_{k j}^{(T)}+\Delta \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)+2 \lambda_{j}\left|\mathbf{w}_{k j}\right| \\ \Delta \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)=-\eta_{j} \nabla w_{k j}^{(T)}+\mu_{j} \Delta \mathbf{w}_{k j}^{(T-1)} \end{array}\end{equation}

My other question is why this is even desirable? Why not just reuse the same network twice? Or perhaps the two networks will be identical after training? If the networks are identical after training, why would you set it up like this? What benefits does it have?

Topic siamese-networks one-shot-learning gradient-descent neural-network

Category Data Science

JoeB152 · Accepted Answer · 2022年1月13日 19:49

You could have a single network, feed both inputs separately, compute the distance/loss, then perform backpropagation.

Or

(as in the cited paper) you could initialize a network, and then create a parallel twin of that network. Because both networks see the same loss, they will remain identical after backpropagation.

The paper is explained in the form of the latter. In fact, I've always seen it explained in the form of the latter.

I can tell you in terms of implementation, its done using the former. As a matter of fact, this is directly from the GitHub for the cited paper.

Why does Siamese neural networks use tied weights and how do they work?

About