Word2Vec: Identifying many-to-one relationships between words
Standard introductory examples in Word2Vec, like king - queen = man - woman
and tokyo - japan = london - uk
, involve one-to-one relationships between words: Tokyo is the exclusive capital of Japan.
More generally, we might want to test for many-to-one relationships: e.g. we might want to ask if Kyoto is a city in Japan. I presume we are still interested in vectors of the form kyoto - japan
, houston - us
, etc., but these vectors are no longer equal.
Do these relationship vectors form a particularly interesting vector space? Do they sample some known distribution? How can I check a many-to-one relationship from the word embeddings?
Topic vector-space-models ai word2vec word-embeddings nlp
Category Data Science