Understanding fastText

fastText is Facebook's open source software to obtain word embeddings (the original paper). Given a document indexed by $n$ and represented by list of n-gram vectors $\{x_1, x_2,\cdots, x_N\}$, the objective their system trying to optimize is

$$ -\frac{1}{N} \sum_{n=1}^N y_n \log(f(BA x_n)) $$

where $B$ and $A$ are weight matrix factorized for performance consideration, $y_n$ is the class label, and $f(\cdot)$ is the softmax function.

Despite the empirical gains reported in the paper, I find this formulation quite unusual as generally we first obtain the representation of entire document. For example, if I use the average of the n-gram vectors as the representation of document, i.e. $\frac{1}{N} \sum_{n=1}^N x_n$, then the objective I will use is

$$ -y_n \log(f(BA \frac{1}{N} \sum_{n=1}^N x_n)) $$

So is my understanding about fastText correct? What is the rationale for they to do this?

Topic representation word-embeddings nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.