Cosine vs Manhattan for Text Similarity

I'm storing sentences in Elasticsearch as dense_vector field and used BERT for the embedding so each vector is 768 dim. Elasticsearch gives similarity function options like Euclidean, Manhattan and cosine similarity. I have tried them and both Manhattan and cosine gives me very similar and good results and now i don't know which one should i choose ?

Topic semantic-similarity manhattan similar-documents cosine-distance

Category Data Science


Intuitively, if you normalized the vectors before using them, or if they all ended up having almost unit norm after training, then a small $l_1$ norm will imply that the angle between the vectors is small, hence the cosine similarity will be high. Conversely, almost colinear vectors will have almost equal coordinates because they all have unit length. So if one works well, the other will work well too.

To see this, remember the equivalence of $l_1$ and $l_2$ norms in $\mathbb{R}^n$, in particular that for any $x \in \mathbb{R}^n$ it holds that $||x||_2 \le ||x||_1$. We can use that to prove the first of the statements (the other is left as an exercise ;)

If $||u||_2 = ||v||_2 = 1$ and $||u-v||_1 \le \sqrt{2\epsilon}$, then $\langle u, v \rangle \ge 1 - \epsilon$.

To prove this just expand $||u-v||_2^2 = 2-2 \langle u, v \rangle$ to obtain:

$$\langle u, v \rangle = 1 - \frac{1}{2} ||u-v||_2^2 \ge 1- \frac{1}{2} ||u-v||_1^2 \ge 1 - \epsilon.$$

So in the end which one you choose is up to you. One reason to prefer the cosine is differentiability of the scalar product, which if you assume normed vectors is all you need.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.