Why does Siamese neural networks use tied weights and how do they work?
Reading this paper on one-shot learning "Siamese Neural Networks for One-shot Image Recognition" I was introduced to the idea of Siamese Neural Networks.
What I did not fully grasp was what they meant by this line:
This objective is combined with standard
backpropagation algorithm, where the gradient is additive
across the twin networks due to the tied weights.
Firstly, how exactly are they tied? Bellow, I believe I've provided the formula by which they compute the gradient. T is the epoch, $\mu_j$ is the momentum, $\lambda_j$ the regularization, $\eta_j$ the learning rate, $w_{kj}$ I believe to be the weight between neuron k and in one layer and j in another but correct me if I'm wrong.
\begin{equation}\begin{array}{c} \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)=\mathbf{w}_{k j}^{(T)}+\Delta \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)+2 \lambda_{j}\left|\mathbf{w}_{k j}\right| \\ \Delta \mathbf{w}_{k j}^{(T)}\left(x_{1}^{(i)}, x_{2}^{(i)}\right)=-\eta_{j} \nabla w_{k j}^{(T)}+\mu_{j} \Delta \mathbf{w}_{k j}^{(T-1)} \end{array}\end{equation}
My other question is why this is even desirable? Why not just reuse the same network twice? Or perhaps the two networks will be identical after training? If the networks are identical after training, why would you set it up like this? What benefits does it have?
Topic siamese-networks one-shot-learning gradient-descent neural-network
Category Data Science