FastText Model Explained

I was reading the FastText paper and I have a few questions about the model used for classification. Since I am not from NLP background, some I am unfamiliar with the jargon. In the figure, what exactly is are the $x_i$? I am not sure what $N$ ngram features mean. If my document has total $L$ words, then how can I represent the entire document using $N$ variables ($x_1$,..,$x_n$)? What exactly is $N$?

$$-\frac{1}{N}\sum_{n=1}^Ny_n\log(f(BAx_n)) $$ If $y_n$ is the label, then what sense does it make to multiply it with the output vector after softmax (lables would be like 0,1,2,3,.. )? Does the author mean we take the $y_n$-th component of the output vector in loss calculation?

Topic fasttext ngrams nlp

Category Data Science


The formula would make sense if $y_n$ is a row vector representing the one-hot encoded label of the classes, and the multiplication is with the single column matrix $log(f(B A x_n))$ representing the log likelihood over all the clases given by the softmax function $f$.

As for $x_n$, it of course must be a vector as well, representing the $N$-grams in the $n$-th document.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.