vector-space-models

How can I train a model to modify a vector by rewarding the model based on the modified vectors nearest neighbors?

RossDeVito

2022年5月15日 20:28

I am experimenting with a document retrieval system in which I have documents represented as vectors. When queries come in, they are turned to vectors by the same method as used for the documents. The query vector's k nearest neighbors are retrieved as the results. Each query has a known answer string. In order to improve performance, I am now looking to create a model that modifies the query vector. What I was looking to do was use a model …

Topic: vector-space-models training reinforcement-learning information-retrieval machine-learning

Category: Data Science

Getting 'ValueError: setting an array element with a sequence.' when attempting to fit mixed-type data

Geza Kerecsenyi

2022年4月23日 16:06

I have already seen this, this and this question, but none of the suggestions seemed to fix my problem (so I have reverted them). I have the following code: nlp = spacy.load('en_core_web_sm') parser = English() class CleanTextTransformer(TransformerMixin): def transform(self, X, **transform_params): return [cleanText(text) for text in X] def fit(self, X, y=None, **fit_params): return self def get_params(self, deep=True): return {} def cleanText(text): text = text.strip().replace("\n", " ").replace("\r", " ") text = text.lower() return text def tokenizeText(sample): tokens = parser(sample) lemmas = …

Topic: vector-space-models scikit-learn python machine-learning

Category: Data Science

How to represent a document in test data with the Document-Term Matrix created from the training set?

Paw in Data

2022年4月10日 09:01

I build a classifier of documents using the vector representation of each document in the training set (i.e a row in the Document-Term Matrix). Now I need to test the model on the test data. But how can I represent a new document with the Document-Term Matrix since some terms might not be included in training data?

Topic: vector-space-models lsi text-mining python

Category: Data Science

How can we use the cosine similarity formula on document feature vector without a direction?

variable

2022年3月4日 14:01

In mathematics, a vector has both magnitude and direction. In data science, for identifying document similarity we convert the document into a feature vector. Then apply cosine angle formula between the source and target document's feature vector. However the cosine formula is applicable only for vectors. And a vector should have both magnitude snd direction. For a document that is represented as a vector, where is the direction?

Topic: vector-space-models classification

Category: Data Science

Combine multiple vector fields for approximate nearest neighbor search

roemchine

2022年2月4日 19:44

I have multiple vector fields in one collection. My use-case is to find similar sentences in similar contexts. The sentences and contexts are encoded to float vectors. Therefore, I have one vector for the sentence and another vector for the context (surrounding text). I would like take both vectors in consideration to find similar sentences. Unfortunately, most approximate nearest neighbor (ann) search libraries only support to search for one field. I have tried to use PostgreSQL with the cube extension …

Topic: ann vector-space-models nlp

Category: Data Science

Dimensionality reduction of vectors with null values

2080

2021年9月7日 15:00

I have vectors of same length where each entry can have the value 0, 1 or null. V = {[0,1,1,1,null,0], [null,1,0,null,0,1], ...} How can I perform a dimensionality reduction of these vectors into a lower dimensional space (in this case 2d)?

Topic: vector-space-models missing-data dimensionality-reduction

Category: Data Science

Approximate maximum dot product between a vector and set of vectors using only a single vector representation for the latter

Curious Ion

2021年6月11日 12:35

If we have a vector $q$ and a set of vectors $D = \{d_1, d_2, ..., d_l\}$ is there a way to create functions $QF$ and $DF$ such that $QF(q)^TDF(D) \approx \max_i(q^Td_i)$ ? Use case: I want to build an information retrieval system in which documents are represented by an arbitrary but small ($<100$) number of vectors and the query is represented by a single vector. Ideally, I would like to sort the documents based on $\max_i(q^Td_i)$ but storing all …

Topic: vector-space-models information-retrieval

Category: Data Science

Dummy vectors and performance measurement for vector search Face Recognition

Yosafat Vincent Saragih

2021年5月28日 04:43

I have about thousands of person face (from celebrity dataset LFW), which each person represented by 512 x 1 vector. I stored it on vector DB to build face searching system using embedded feature (MTCNN for face detection and arcface for embedding model). Someone suggest me to add many vectors as "dummy faces" to the database with unknown class (the number of the vectors is larger than the personal class). It's still unclear for me why I need to add …

Topic: vector-space-models embeddings image-recognition

Category: Data Science

how to calculate similarity between users based on movie ratings

OnurTR

2021年5月7日 05:03

Hi I am working on a movie recommendation system and I have to find alikeness between the main user and other users. For example, the main user watched 3 specific movies and rated them as 8,5,7. A user who happened to watch the same movies rated them as 8,2,3 and an another user of the same kind rated those movies as 7,6,6 and some other user only watched first two movies and he rated them as 8,5. Now the question …

Topic: vector-space-models similarity recommender-system

Category: Data Science

How come same cluster category be separated?

Jack Zaki Zakiul Fahmi Jailani

2021年4月13日 15:07

I have these 200 vectors which were clustered using K-means clustering based on keywords weight similarity that was given by TF-IDF (Term Frequency - Inverse Document Frequency). The vectors were clustered with respect to the vectors in four cities which are Amsterdam, Rotterdam, The Hague and, Utrecht. I have chosen k-cluster centroid = 6, which means I have cluster 0 to cluster 5. On each cluster, I also calculated the average number of keyword's numerical weight so that then I …

Topic: vector-space-models tfidf k-means

Category: Data Science

Can I sum up feature vectors of a user‘s collection?

Lurca

2021年4月10日 15:06

I want to find items that are similar to items users already have in their collection. Every item has attributes, so I created feature vectors where every element of the vector represents an attribute and is either $0$ or $1$ (if an item has that attribute). For the user collection I summed up all vectors, creating one vector which I then used to calculate similarities with other items. Is this a correct approach or should I make this "user vector", …

Topic: vector-space-models cosine-distance similarity recommender-system

Category: Data Science

Word2Vec: Identifying many-to-one relationships between words

Abhimanyu Pallavi Sudhir

2021年4月8日 12:10

Standard introductory examples in Word2Vec, like king - queen = man - woman and tokyo - japan = london - uk, involve one-to-one relationships between words: Tokyo is the exclusive capital of Japan. More generally, we might want to test for many-to-one relationships: e.g. we might want to ask if Kyoto is a city in Japan. I presume we are still interested in vectors of the form kyoto - japan, houston - us, etc., but these vectors are no longer …

Topic: vector-space-models ai word2vec word-embeddings nlp

Category: Data Science

Non-commutative distance formula

Himanshu

2021年4月7日 14:33

I am trying to find a distance formula or a method that can give the non-commutative distance between two points in a feature space. Suppose there are two movies represented in an R^n feature space. Now I want that when I try to find the distance/similarity between these movies using the feature vectors, I get different values with respect to which movie is the reference point i.e., Dist(Mov1, Mov2) != Dist(Mov2, Mov1) I know this is slightly vague, but I …

Topic: features vector-space-models mathematics distance similarity

Category: Data Science

Is it acceptable to append information to word embeddings?

forgetso

2021年1月5日 15:32

Let's say I have my 300 dimensional word embedding trained with Word2Vec and it contains 10,000 word vectors. I have additional data on the 10,000 words in the form of a vector (10,000x1), containing values between 0 and 1. Can I simply append the vector to the word embedding so that I have a 301 dimensional embedding? I am looking to calculate similarities between word vectors using cosine similarity.

Topic: vector-space-models word2vec nlp

Category: Data Science

How can I model the autocorrelation of objective variables under the situation where we can't observe any actual objective variable in the test phase

messefor

2020年11月2日 00:53

I'm trying to model the relationship between the declared value from a subject and stimulus. For example, modeling a relationship between the subject's happiness and strength of stimulus so that we can predict the subject's sadness from stimuli. (The Happiness are five scale ratings, stimuli are continuous value) Emotions like happiness are obviously autocorrelated and I think modeling these autocorrelations might help the model make a better prediction. However, we can only observe happiness (actual value) in the training phase …

Topic: vector-space-models time-series

Category: Data Science

Stacking/Concatenating/Combining two vector space models

SFD

2020年9月27日 13:05

I have two vector-space models, with different dimensions. The number of vectors in one model is the same as the number of vectors in the other. I.E: if I have vector representation for a car in one model, I have vector representation for a car in the other model, but the number of dimensions can be different. I want to combine these models (and then cluster using the combined model), I cannot average (BoW) or add these models together as …

Topic: vector-space-models nlp machine-learning

Category: Data Science

Is it accurate to say that "K-means clustering the vectors based on keywords weight similarity"?

Jack Zaki Zakiul Fahmi Jailani

2020年7月10日 10:01

Long story short, I have 200 vectors as a result of TF-IDF (Term Frequency - Inverse Document Frequency) on thousands of keywords in hundreds of vectors. The total number of unique keywords that I got is 745 keywords, meaning that there are 745 dimensions/axes. Now, I was wondering how does K-means clustering work on those 200 vectors? Is it accurate to say that K-Means is clustering those 200 vectors by the keywords weight similarity?

Topic: vector-space-models tfidf k-means machine-learning

Category: Data Science

what is the difference between positional vector and attention vector used in transformer model?

Aj_MLstater

2020年7月4日 11:18

what is the difference between positional vector and attention vector used in transformer model ? , i saw a video in youtue and the defintion for positional vector was give as :* "vector that gives context based on postion of word in sentence "* defintion for attention vector was give as "For ever word we can have attention vector generated which captures contextual relationship between words in sentence" Capturing context information based on distance(postional vector) and attention (attention vector ) …

Topic: transformer attention-mechanism vector-space-models rnn deep-learning

Category: Data Science

NN embedding layer

cbake

2020年6月26日 07:44

Several neural network libraries such as tensorflow and pytorch offer an Embedding layer. Having implemented word2vec in the past, I understand the reasoning behind wanting a lower dimensional representation. However, it would seem the embedding layer is just a linear layer. All other things being equal, would an embedding layer not just learn the same weights as the equivalent linear layer? If so, then what are the advantages of using an embedding layer? In the case of word2vec, the lower …

Topic: vector-space-models tensorflow word-embeddings deep-learning machine-learning

Category: Data Science

Why is n-grams language independent?

Bharathi

2020年4月29日 13:26

I don't understand how n-grams are language independent. I've read that by using character n-grams of a word than the word itself as dimensions of a vector space model, we can skip the language-dependent pre-processing such as stemming and stop word removal. Can someone please provide reasoning for this?

Topic: vector-space-models stanford-nlp ngrams nlp

Category: Data Science

About