How to build a symmetric similarity model on top of embeddings?

I have two equal length vectors that come out of two identical embedding layers.

I want to calculate their similarity, and I don't trust the embedding layer enough to just use dot product (e.g. it's plausible that different coordinates are dependent wrt overall similarity). I want to learn this using examples of good and bad pairs, without actually learning the initial embedding.

What I'd like to do is to somehow combine the two vectors using another layer, and then connect this layer to an output layer to get the final decision (similar/not similar).

The trivial way is to add another layer, and fully connect the two concatenated embedding vectors to this new layer. The downside is that the model is not symmetric, which makes the search space bigger than it should be.

Is there a better way?

Ideas I have so far:

  1. Cross product the two vectors, then do the learning on the $n \times n$ output (easy enough, but might have too many weights to learn).

  2. Create two symmetrical neural nets, where the weights are symmetrical (as if mirror image of each other). Intuitively this can allow arbitrary combinations of coordinates from both sides to feed the next layer (how can equalizing the weights be done in keras?).

Is there a better way?

Bonus points: Beyond symmetry, how does one instigate transitive and reflexive properties? By just adding a ton of trivial $(x, x)$ examples?

Topic keras word-embeddings similarity

Category Data Science


That's an interesting question! Surprisingly, I have never read specific scientific papers on this particular problem. Here, you would like to learn a symmetric function that transforms two input vectors into a scalar. In a more general setting, we can find in the literature some need for permutation invariant functions (see the Set Transformer and the Aggregation Schemes for Graph Neural Network ). Basically, you have two options:

  1. As you pointed out, you can take whatever nonsymmetric function, such as concatenation + feed-forward network, and train it with (x,y) and (y,x) pairs. But, there might be a smarter way...
  2. Design a permutation invariant architecture. Here your chance is that you only have 2 embeddings $e_i$ and $e_j$. Sadly, you say they cannot be trusted for their mutual dot product. Then, I might suggest applying some linear/nonlinear transformation to your embeddings i.e. define a trainable matrix $Q$, and compute the dot product in the new space $v_i = Qe_i$ and $v_j = Qe_j$. By training $Q$ with example pairs $(e_i,e_j)$, you will find a suitable transformation of the embeddings such that the dot product $v_i \cdot v_j$ fits the true similarity. You can of course use a more complex transformation (multiple nonlinear transformations for example). The key point is that using the same parameters for transforming both embeddings ensures symmetry.

Bonus points: I am not sure of your question, but my proposition in (2) might naturally deal with transitivity and reflexivity.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.