Can I average the BERT embeddings of multiple instances of the same word to get one vector representation of the word?

In the project I'm working on right now I would like to get one embedding for every unique lemma in a corpus. Could I get this by averaging the embeddings of every instance of a lemma?

For example, say that there were 500 tokens of the lemma walk - regardless of conjugation - could I then add/average/concatenate these 500 embeddings together to get one embedding accurately representing all of them?

If this would work, which operation should I use on the embeddings to get the best result?

Topic corpus bert word-embeddings

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.