Dot product for similarity in word to vector computation in NLP
In NLP while computing word to vector we try to maximize log(P(o|c)). Where P(o|c) is probability that o is outside word, given that c is center word.
Uo is word vector for outside word
Vc is word vector for center word
T is number of words in vocabulary
Above equation is softmax. And dot product of Uo and Vc acts as score, which should be higher the better. If words o and c are closer then their dot product should be high, but that is not the case with dot product. Because of following:
Consider vectors
A=[1, 1, 1], B=[2, 2, 2], C=[100, 100, 100]
A.B = 1 * 2 + 1 * 2 + 1 * 2 = 6
A.C = 1 * 100 + 1 * 100 + 1 * 100 = 300
Vectors A and B are closer compared to A and C, however dot product A.C A.B
So Dot product is acting as distance and not as similarity measure. Then why is it used in softmax.
Please help me make my understanding better
Topic softmax word2vec word-embeddings nlp similarity
Category Data Science