Can I average the BERT embeddings of multiple instances of the same word to get one vector representation of the word?
In the project I'm working on right now I would like to get one embedding for every unique lemma in a corpus. Could I get this by averaging the embeddings of every instance of a lemma?
For example, say that there were 500 tokens of the lemma walk - regardless of conjugation - could I then add/average/concatenate these 500 embeddings together to get one embedding accurately representing all of them?
If this would work, which operation should I use on the embeddings to get the best result?
Topic corpus bert word-embeddings
Category Data Science