embeddings

how to train custom word2vec embeddings to find related articles?

Balu

2022年6月2日 19:02

I am beginner in machine learning. My project is to make search engine based on AI which shows related articles when we search on website. For this i decided to train my own embedding. I found two methods for this: One is to train network to find next word( i.e inputs=[the quick,the quick brown,the quick brown fox] and outputs=[brown, fox,lazy] Other method is to train with nearest words(i.e [brown,fox],[brown,quick],[brown,quick]). Which method should i use and after training how should i …

Topic: embeddings word-embeddings nlp

Category: Data Science

How to get vector representations(or embeddings) of time series?

awakened_iota

2022年6月2日 02:03

Even if a time series is constructed up of numbers only, finding abstract fixed-dim vector representation would be interesting for classification/clustering purposes. As we can learn & find abstract representations/embeddings of text/images, can we do something similar on Time series? Finding such ways would result in better clustering & related tasks instead of traditional ways using some statistical measures like Pearson correlation etc. All thoughts are welcome.

Topic: embeddings deep-learning time-series

Category: Data Science

Transformer time series classification using time2vec positional embedding

Reignbeaux

2022年5月30日 12:45

I want to use a transformer model to do classification of fixed-length time series. I was following along this tutorial using keras which uses time2vec as a positional embedding. According to the original time2vec paper the representation is calculated as $$ \boldsymbol{t2v}(\tau)[i] = \begin{cases} \omega_i \tau + \phi_i,& i = 0\\ F(\omega_i \tau + \phi_i), & 1 \leq i \leq k \end{cases} $$ The mentioned tutorial simply concatenates this embedding with the input. Now, I understand the intention of the …

Topic: transformer embeddings keras

Category: Data Science

Can I get un-normalized vectors from the TF USE model?

kevin_was_here

2022年5月25日 13:05

I'm using this Universal Sentence Encoder (USE) model to get embeddings of a set of texts, each text corresponding to a newspaper article. In order to build a Recommender System, I generate user embeddings by averaging the embeddings of items a user has read, and then I look for other texts that are cosine-similar to this user (basically, the method returns a set of items that are similar to this user embedding). Now, the problem is that the mentioned model …

Topic: embeddings tensorflow text word-embeddings recommender-system

Category: Data Science

Embedding from Transformer-based model from paragraph or documnet (like Doc2Vec)

Bloodstone Programmer

2022年5月24日 18:04

I have a set of data that contains the different lengths of sequences. On average the sequence length is 600. The dataset is like this: S1 = ['Walk','Eat','Going school','Eat','Watching movie','Walk'......,'Sleep'] S2 = ['Eat','Eat','Going school','Walk','Walk','Watching movie'.......,'Eat'] ......................................... ......................................... S50 = ['Walk','Going school','Eat','Eat','Watching movie','Sleep',.......,'Walk'] The number of unique actions in the dataset are fixed. That means some sentences may not contain all of the actions. By using Doc2Vec (Gensim library particularly), I was able to extract embedding for each of the sequences …

Topic: doc2vec bert transformer embeddings nlp

Category: Data Science

Discriminator of a Conditional GAN with continuous labels

user3023715

2022年5月24日 13:01

OK, let's say we have well-labeled images with non-discrete labels such as brightness or size or something and we want to generate images based on it. If it were done with a discrete label it could be done like: def forward(self, inputs, label): self.batch = inputs.size(0) h = self.res1(inputs) h = self.attn(h) ... h = self.res5(h) h = torch.sum((F.leaky_relu(h,0.2)).view(self.batch,-1,4*4), dim=2) outputs = self.fc(h) if label is not None: embed = self.embedding(label) outputs += torch.sum(embed*h,dim=1,keepdim=True) The embedding can be made to …

Topic: generative-models embeddings labels regression deep-learning

Category: Data Science

TextVectorization and Autoencoder for feature extraction of text

Лаврентий Крибель

2022年5月21日 19:49

I'm trying to solve a problem which is as follows: I need to train the autoencoder to extract useful data from text. I will use the trained autoencoder in another model to extract features. The goal is to teach the autocoder to compress the information and then reconstruct the exact same string. I solve the problem of classification for each letter. My dataset: X_train_autoencoder_raw: 15298 some text... 1127 some text... 22270 more text... ... Name: data, Length: 28235, dtype: object …

Topic: embeddings tensorflow autoencoder feature-engineering nlp

Category: Data Science

Keras: Softmax output into embedding layer

Physbox

2022年5月20日 12:07

I'm trying to build an encoder-decoder network in Keras to generate a sentence of a particular style. As my problem is unsupervised i.e. I don't have the ground truths for the generated sentences, I use a classifier to help during training. I pass the decoder's output into the classifier to tell me what style the decoded sentence is. The decoder outputs a softmax distribution which I was intending to feed straight into the classifier but I realised that it has …

Topic: sequence-to-sequence embeddings keras

Category: Data Science

Triplet loss - what threshold to use to detect similarity between two embeddings?

Justas B.

2022年5月19日 13:04

I have trained my triplet loss model using FaceNet's architecture. I used 11k hands dataset. Now I want to see how well my model performed, so I feed it 2 images of the same class and get back their embeddings. I want to compare the distance between these embeddings and if that distance is not larger than some threshold I can say that the model correctly classifies these 2 images as of the same class. How do I select the …

Topic: embeddings convolutional-neural-network machine-learning

Category: Data Science

Generalize min-max scaling to vectors

Gilad Deutsch

2022年5月18日 15:08

I am combining several vectors, where each vector is a certain kind of embedding of some object. Since each embedding is very different (some have all components between $[0, 1]$ some have components in the range of around 60 or 70 etc.) I want to rescale the vectors before combining them. I thought about using something like min-max rescaling, but I'm not sure how to generalize it to vectors. I could do something of the sort - $\frac{v-|v_{min}|}{|v_{max}|-|v_{min}|)}$ but I …

Topic: embeddings normalization feature-scaling

Category: Data Science

Is there a sensible notion of 'character embeddings'?

Ramiro Hum-Sah

2022年5月16日 09:34

There are several popular word embeddings available (e.g., Fasttext and GloVe); In short, those embeddings are a tool to encode words along with a sensible notion of semantics attached to those words (i.e. words with similar sematics are nearly parallel). Question: Is there a similar notion of character embedding? By 'character embedding' I understand an algorithm that allow us to encode characters in order to capture some syntactic similarity (i.e. similarity of character shapes or contexts).

Topic: embeddings word-embeddings nlp

Category: Data Science

What are the differences between Knowledge Graph Embeddings (KGE) and Graph Neural Network (GNN)

nkhuyu

2022年5月14日 20:42

From page 3 of this paper Knowledge Graph Embeddings and Explainable AI, they mentioned as below: Note that knowledge graph embeddings are different from Graph Neural Networks (GNNs). KG embedding models are in general shallow and linear models and should be distinguished from GNNs [78], which are neural networks that take relational structures as inputs However, it's still vague to me. It seems that we can get embeddings from both of them. What are the difference? How should we choose …

Topic: graph-neural-network embeddings deep-learning

Category: Data Science

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

DunkOnly

2022年5月14日 20:02

For unsupervised text clustering, the key thing is the init embedding for text. If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model. BERT can not get good init embedding. If we do not use deep model, is there better way to get embedding better than glove wordvec?

Topic: representation embeddings word-embeddings deep-learning clustering

Category: Data Science

How are the embedding and context matrices created and updated in word embedding?

Revolucion for Monica

2022年5月13日 14:48

I am struggling to understand how word embedding works, especially how the embedding matrix $W$ and context matrix $W'$ are created/updated. I understand that in the Input we may have a one-hot encoding of a given word, and that in the output we may have the word the most likely to be nearby this word $x_i$ Would you have any very simple mathematical example?

Topic: embeddings word-embeddings nlp

Category: Data Science

Are there any graph embedding algorithms like this already?

monomonedula

2022年5月13日 13:00

I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples. The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation? Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective. In short, the algorithm …

Topic: numpy representation embeddings graphs python

Category: Data Science

key generation from feature vectors in high dimentions

azmi haider

2022年5月5日 16:56

I welcome any suggestions to solve the following hard problem: I have a dataset of float feature vectors of size 512 where each feature vector is extracted from a face image. I want to generate a key given a feature vector (this key can be a number/binary code/etc) that is consistent to each person without comparisons between feature vectors. The only input I have is the given feature vector. for example if I see a photo of me I want …

Topic: embeddings deep-learning clustering

Category: Data Science

Why does averaging word embedding vectors (exctracted from the NN embedding layer) work to represent sentences?

HelpNeederStudent

2022年4月29日 21:49

I'm puzzling to understand why the method of averaging word embeddings works in order to obtain sentence embedding, in particular considering the exercize of this post How to obtain vector representation of phrases using the embedding layer and do PCA with it. My current question actually is to understand the theory behind that more practical post. The answer to the question linked uses a method for sentence embedding that is averaging the word embeddings (in the most naive and simplest …

Topic: embeddings rnn word-embeddings neural-network nlp

Category: Data Science

How to extract embeddings of categorical variables

user62198

2022年4月20日 20:06

I am little bit confused about encoding categorical variables. There are other posts/blogposts on this issue but none is talking about the problem I am facing. I have a dataset with mixed variables (i.e, numerical as well as categorical). Some of the categorical variables has a lot of categories (close to 100). So instead of using One Hot encoders, I am looking into using embeddings. My goal is to: Use the embeddings of the categorical variables and extract them and …

Topic: data-science-model embeddings python

Category: Data Science

Graph embeddings of Wikidata items

david brick

2022年4月19日 11:50

I'm trying to use PyTorch BigGraph pre-trained embeddings of Wikidata items for disambiguation. The problem is that the results I am getting by using dot (or cosine) similarity are not great. For example, the similarity between the Python programming language and the snake with the same name is greater than between Python and Django. Does anybody know if there is a Wikidata embedding that results in better similarities? The only alternative I've found is Webmembedder embeddings but they are incomplete. …

Topic: knowledge-graph embeddings

Category: Data Science

How to choose the good number dimension of autoencoder?

Truong Hoang

2022年4月12日 21:03

I'm using Autoencoder for feature extracting. I stuck with how to choose good number of dimension of encoder layer (latent layer). After training dataset, the model gave the latent layer (embedding layer) with some zero value in the vector result. For example, the embedding layer have 4 dimensions, one of node (unit) in embedding layer has value [0.67 0.0 2.13 0.43]. That I suppose they should 4 values different zero value. I think my problem that I choose too many …

Topic: embeddings autoencoder

Category: Data Science

About