word-embeddings

Why is 10000 used as the denominator in Positional Encodings in the Transformer Model?

ThirtyOneTwentySeven

2022年6月3日 20:30

I was working through the All you need is Attention paper, and while the motivation of positional encodings makes sense and the other stackexchange answers filled me in on the motivations of the structure of it, I still don't understand why $1/10000$ was used as the scaling factor for the $pos$ of a word. Why was this number chosen?

Topic: transformer word-embeddings machine-learning

Category: Data Science

how to train custom word2vec embeddings to find related articles?

Balu

2022年6月2日 19:02

I am beginner in machine learning. My project is to make search engine based on AI which shows related articles when we search on website. For this i decided to train my own embedding. I found two methods for this: One is to train network to find next word( i.e inputs=[the quick,the quick brown,the quick brown fox] and outputs=[brown, fox,lazy] Other method is to train with nearest words(i.e [brown,fox],[brown,quick],[brown,quick]). Which method should i use and after training how should i …

Topic: embeddings word-embeddings nlp

Category: Data Science

Is it possible to add new vocabulary to BERT's tokenizer when fine-tuning?

user123635

2022年6月1日 09:05

I want to fine-tune BERT by training it on a domain dataset of my own. The domain is specific and includes many terms that probably weren't included in the original dataset BERT was trained on. I know I have to use BERT's tokenizer as the model was originally trained on its embeddings. To my understanding words unknown to the tokenizer will be masked with [UNKNOWN]. What if some of these words are common in my dataset? Does it make sense …

Topic: bert finetuning word-embeddings nlp

Category: Data Science

Cluster words into groups of similar meaning (synonyms)

Ben

2022年5月31日 22:08

How can words be clustered into groups of similar meaning (synonyms)? I started with pre-trained word embeddings (e.g., Google News), which is great, but not perfect - a limitation arises because the word embeddings are based on surrounding words. This introduces challenging results. For example: polar meanings: word embeddings might find opposites to be similar. Even though these words mean the opposite semantically, they can quite readily be interchanged given the same preceding and following words. For example, "terrible" and …

Topic: semantic-similarity text word-embeddings nlp clustering

Category: Data Science

How can I use the embedding generated by mBERT with a CNN or SVM as a classifier?

user18848025

2022年5月30日 13:10

I have a school project and need to use the embeddings generated by BERT, for example, mBERT, and using a classifier like SVM, CNN... Any help, please. Thank you!

Topic: bert transformer cnn word-embeddings nlp

Category: Data Science

How to access an embedding table that is too large to fully load into memory?

Nels

2022年5月30日 09:00

I'm currently trying to find a way of loading/deserializing a .json file containing Flair word embeddings that is too large to fit in my RAM at once (>60GB .json with 32GB of RAM). My current code for loading the embedding is below. def get_embedding_table(config): words_id2vec = json.load(open(config.words_id2vector_filename, 'r')) words_vectors = [0] * len(words_id2vec) for id, vec in words_id2vec.items(): words_vectors[int(id)] = vec words_vectors.append(list(np.random.uniform(0, 1, config.embedding_dim))) words_embedding_table = tf.Variable(name='words_emb_table', initial_value=words_vectors, dtype=tf.float32) The rest of the code that I am trying to reproduce …

Topic: tensorflow word-embeddings nlp

Category: Data Science

How to train millions of doc2vec embeddings using GPU?

Aljo Jose

2022年5月27日 07:01

I am trying to train a doc2vec based on user browsing history (urls tagged to user_id). I use chainer deep learning framework. There are more than 20 millions (user_id and urls) of embeddings to initialize which doesn’t fit in a GPU internal memory (maximum available 12 GB). Training on CPU is very slow. I am giving an attempt using code written in chainer given here Please advise options to try if any.

Topic: word-embeddings deep-learning nlp

Category: Data Science

Which model is better able to understand the difference that two sentences are talking about different things?

Ir8_mind

2022年5月26日 10:11

I'm currently working on the task of measuring semantic proximity between sentences. I use fasttext train _unsiupervised (skipgram) for this. I extract the sentence embeddings and then measure the cosine similarity between them. however, I ran into the following problem: cosine similarity between embeddings of these sentences: "Create a documentation of product A"; "he is creating a documentation of product B" is very high (>0.9). obviously it because both of them is about creating a documentation. but however the first …

Topic: semantic-similarity transformer word-embeddings deep-learning nlp

Category: Data Science

Can I get un-normalized vectors from the TF USE model?

kevin_was_here

2022年5月25日 13:05

I'm using this Universal Sentence Encoder (USE) model to get embeddings of a set of texts, each text corresponding to a newspaper article. In order to build a Recommender System, I generate user embeddings by averaging the embeddings of items a user has read, and then I look for other texts that are cosine-similar to this user (basically, the method returns a set of items that are similar to this user embedding). Now, the problem is that the mentioned model …

Topic: embeddings tensorflow text word-embeddings recommender-system

Category: Data Science

Initializing weights that are a pointwise product of multiple variables

Witiko

2022年5月21日 17:05

In two-layer perceptrons that slide across words of text, such as word2vec and fastText, hidden layer heights may be a product of two random variables such as positional embeddings and word embeddings (Mikolov et al. 2017, Section 2.2): $$v_c = \sum_{p\in P} d_p \odot u_{t+p}$$ However, it's unclear to me how to best initialize the two variables. When only word embeddings are used for the hidden layer weights, word2vec and fastText initialize them to $\mathcal{U}(-1 / \text{fan_out}; 1 / \text{fan_out})$. …

Topic: fasttext weight-initialization word2vec word-embeddings nlp

Category: Data Science

Training Word2Vec with names instead of sentences

Krishna Kalyan

2022年5月19日 10:06

I have scientific database with articles and coauthors. using this database I am training word2vec model on co-authors. Use use case here is to disambiguate authors. I was wondering my approach here can be improved or any suggestions will greatly be appreciated. Code

Topic: word2vec word-embeddings nlp python

Category: Data Science

How to train neural word embeddings?

Vatsal Aggarwal

2022年5月18日 14:05

So I am new to Deep Learning and NLP. I have read several blog posts on medium, towardsdatascience and papers where they talk about pre-training the word embeddings in an unsupervised fashion and then use them in supervised DNN. But recently I read a blog post which suggested that training the word embeddings while training the neural network gives better results. This is the other link. So my question is which one should I follow? Some YouTube videos that I …

Topic: word-embeddings deep-learning neural-network nlp

Category: Data Science

Contextual word embeddings from pretrained word2vec vectors

amks1212

2022年5月17日 18:00

I would like to create word embeddings that take context into account, so the vector of the word Jaguar [animal] would be different from the word Jaguar [car brand]. As you know, word2vec only gives one representation for a given word, and I would like to take already pretrained embeddings and enrich them with context. So far I've tried a simple way with taking an average vector of the word and category word, for example like this. Now I would …

Topic: text-classification word-embeddings deep-learning neural-network nlp

Category: Data Science

Is there a sensible notion of 'character embeddings'?

Ramiro Hum-Sah

2022年5月16日 09:34

There are several popular word embeddings available (e.g., Fasttext and GloVe); In short, those embeddings are a tool to encode words along with a sensible notion of semantics attached to those words (i.e. words with similar sematics are nearly parallel). Question: Is there a similar notion of character embedding? By 'character embedding' I understand an algorithm that allow us to encode characters in order to capture some syntactic similarity (i.e. similarity of character shapes or contexts).

Topic: embeddings word-embeddings nlp

Category: Data Science

How to train NER LSTM on single sentence level

Rien

2022年5月16日 02:04

My documents are only a single sentence long, containing one annotation. Sentences with the same named entity of course are similar, but not context-wise. NER training examples (afaik) always has documents sequentially related, aka the next document is context-wise related to the previous document. Consider the example below. The first sentence is about the US, with location annotations. The second sentence is about an organisation but still related to the previous. The United States of America (LOC), commonly known as …

Topic: lstm word-embeddings named-entity-recognition nlp machine-learning

Category: Data Science

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

DunkOnly

2022年5月14日 20:02

For unsupervised text clustering, the key thing is the init embedding for text. If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model. BERT can not get good init embedding. If we do not use deep model, is there better way to get embedding better than glove wordvec?

Topic: representation embeddings word-embeddings deep-learning clustering

Category: Data Science

Word-level text generation with word embeddings – outputting a word vector instead of a probability distribution

czypsu

2022年5月13日 23:02

I am currently researching the topic of text generation for my university project. I decided (ofc) to go with a RNN getting a sequence of tokens as input with a target of predicting the next token given the sequence. I have been reading through a number of tutorials and there is one thing that I am wondering about. The sources I have read, regardless of how they encode the X sequences (one-hot or word embeddings), encode the y target tokens …

Topic: text-generation rnn word-embeddings nlp

Category: Data Science

How are the embedding and context matrices created and updated in word embedding?

Revolucion for Monica

2022年5月13日 14:48

I am struggling to understand how word embedding works, especially how the embedding matrix $W$ and context matrix $W'$ are created/updated. I understand that in the Input we may have a one-hot encoding of a given word, and that in the output we may have the word the most likely to be nearby this word $x_i$ Would you have any very simple mathematical example?

Topic: embeddings word-embeddings nlp

Category: Data Science

The embeddings of GoogleNews-vectors-negative300.bin is stemmed or not?

Rockko Rock

2022年5月13日 00:06

I will use the embedding GoogleNews-vectors-negative300.bin data to do some project. Just want to know whether it was stemmed of the words?

Topic: word-embeddings

Category: Data Science

Why is Word2vec regarded as a neural embedding?

CMB

2022年5月9日 03:35

In the skip-gram model, the probability that a word $w$ is part of the set of context words $\{w_o^{(i)}\}$ $(i= 1:m)$ where $m$ is the context window around the central word, is given by: $$p(w_o | w_c) = \frac{\exp{(\vec{u_o}\cdot \vec{v_c)}}}{\sum_{i\in V}\exp{(\vec{u_i}\cdot \vec{v_c)}}} $$ where $V$ is the number of words in the training set, $\vec{u_i}$ is the word embedding for the context word and $\vec{v_i}$ is the word embedding for the central word. But this type of model is defining …

Topic: multilabel-classification word2vec word-embeddings logistic-regression neural-network

Category: Data Science

About