One-hot vector for fixed vocabulary
given a vocabulary with $|V|=4$ and V = {I, want, this, cat} for example.
How does the bag-of-words representation with this vocabulary and one-hot encoding look like regarding example sentences:
- You are the dog here
- I am fifty
- Cat cat cat
I suppose it would look like this
$V_1 = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ \end{pmatrix}$
$V_2 = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \\ \end{pmatrix}$
$V_3=\begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \\ \end{pmatrix}$
But what exactly is the point of this representation? Does is show the weakness of one-hot encoding with a fixed vocabulary or did I miss something?
Topic bag-of-words one-hot-encoding
Category Data Science