representation

faster alternatives to sparse.model.matrix?

Isaac T

2022年5月26日 08:01

I have a large dataset that is entirely categorical. I'm trying to train with it using xgboost, so I must first convert this categorical data to numerical. So far I've been using sparse.model.matrix() in the Matrix library but it is far too slow. I found a great solution here, however, the sparse matrix it returns in not the same one that sparse.model.matrix returns. I know there is a way to force sparse.model.matrix to return identical output as the solution in …

Topic: representation r categorical-data

Category: Data Science

Using large CNNs (e.g., ResNet) in convolutional autoencoders for image representation learning

b19wh33l5

2022年5月20日 02:00

I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am …

Topic: vgg16 representation cnn autoencoder computer-vision

Category: Data Science

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

DunkOnly

2022年5月14日 20:02

For unsupervised text clustering, the key thing is the init embedding for text. If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model. BERT can not get good init embedding. If we do not use deep model, is there better way to get embedding better than glove wordvec?

Topic: representation embeddings word-embeddings deep-learning clustering

Category: Data Science

Are there any graph embedding algorithms like this already?

monomonedula

2022年5月13日 13:00

I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples. The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation? Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective. In short, the algorithm …

Topic: numpy representation embeddings graphs python

Category: Data Science

Is Self-Supervised Learning a task of Representation Learning?

Ruffybeo

2022年5月1日 22:14

Maybe a weird question but: Currently, I'm writing a seminar paper about Self Supervised Learning for time series data. For this paper, I have to find methods to prepare unlabelled time series data with SSL techniques to perform a classification task. In a scientific paper, I was able to find time series representation learning methods. Another SSL paper used one of those methods to do the classification on a specific dataset. Now I have to admit that I'm kind of …

Topic: representation unsupervised-learning deep-learning feature-extraction machine-learning

Category: Data Science

Neural Nets: unordered sets of ordered tuples as features of data

Alex

2022年4月23日 12:03

I'm working on a very small scale pet project in which inputs are essentially sets of (x, y) pairs, and are to be classified into categories, using deep learning, specifically using Keras (I know this may not be the best for this, but it's more of a proof of concept / I want to try it out). However, I'm not sure how to go about representing the data. I'm starting with a simple classification problem (i.e. if (a, b) is …

Topic: representation keras deep-learning neural-network

Category: Data Science

Does feature engineering require absolute accuracy?

Hing

2022年4月12日 15:20

Sometimes when I'm studying the datasets, the text field is particularly challenging to handle. For whatever features I want to derive from the text fields, I try to apply some heuristic to approximate certain text patterns from the text fields so I can extract some features. (Think of those heuristics as some self-invented regex...). I'm concerned about the sanity of such heuristic: do people in practice also approximate and extract the features with heuristic only? (e.g. it may leads to …

Topic: representation feature-engineering

Category: Data Science

KNN efficient implementation

Nathan Jodo

2022年4月12日 00:03

The KNN algorithm is very handy and particularly suited to some of my problems, but I can't find any resources on how to implement it in production. As a comparative example, when I use a neural network, I already have at my disposal high-level tools allowing me to apply the neural network to examples (either library allowing me to smartly exploit the hardware of my devices when I want to do embedded, or infrastructures allowing me to use my neural …

Topic: k-nn representation implementation decision-trees performance

Category: Data Science

Using categorical and continuous variables in Deep Learning

Lila

2022年4月2日 09:06

I would like to apply a MLP to some business seller data. I found that the data is a mix of both categorical and continuous features. For what I read it is not advisable to feed a neural network with both types of data (reference unknown/unavailable) and I remember that I read that one can use the following model: Categorical variables-->NN model 1 ----->NN model 3---->Output Continuous variables--->NN model 2 So in this model we have two neural networks that …

Topic: representation deep-learning neural-network dataset

Category: Data Science

Good chromosome representation in a VRPTW genetic algorithm

Ivva Hamerníková

2022年3月4日 05:01

I have a genetic algorithm for a vehicle routing problem with time windows and I need to implement certain modifications. I am not sure what would be the best chromosome representations. I have tasks which can be divided into 3 sub-tasks with certain ordered time windows, they have to processed in order and all 3 (they represent collecting certain goods in a storage, delivering them and returning packaging to another storage). In the algorithm crossover part these tasks are combined …

Topic: representation genetic-algorithms optimization

Category: Data Science

Why are wavelet transforms not scale-equivariant?

diegor

2022年2月23日 13:38

One can rely on continuous wavelets to build a multi-resolution analysis that is equivariant ("covariant") under the action of a discrete subgroup of translation. When not downsampled, the multi-resolution analysis of a 1D signal can be seen as a matrix of n x m coefficients, where n are the octaves that one wants to capture, and m are the number of considered translated wavelets on each octave. Equivariance to translation in this case means that a certain translation of the …

Topic: image-segmentation representation processing

Category: Data Science

Representaion Learning - Self-supervision methods that do well with a limited amount of classes when

Niminim

2022年1月31日 06:51

I understand that a contrastive learning approach such as SimCLR has an inherent problem when dealing with a low number of classes (let's say 2,3,5,6, maybe even 10). Problem is that the chances of picking a negative sample that has the same label as the image from the positive pair is not low (let's say a dog and another dog) Which contrastive learning approaches do better on such problems that we have let's say 4 classes rather than 1000 (or …

Topic: representation unsupervised-learning computer-vision deep-learning

Category: Data Science

Output representation for a neural network to learn grid-based game with multiple units

Druudik

2022年1月30日 13:38

I have a round based game played on a grid map with multiple units that I would like to control in some fashion using neural network (NN). All of the units are moved at once. Each unit can move in any of the grid map direction: $up$, $down$, $left$ and $right$. So if we have $n$ units then output policy vector of NN should have $n^4$ entries that represents probabilities, one for each move. Note that one move represents actions …

Topic: multi-output representation reinforcement-learning deep-learning

Category: Data Science

How can i get the vector of word using BERT?

user5520049

2022年1月26日 21:29

I need to get word-vectors using BERT and got this function that i think it should be the one i need def get_bert_embed_matrix(sentences): device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model_config = transformers.AutoConfig.from_pretrained('bert-base-uncased', output_hidden_states=True) model = transformers.AutoModel.from_pretrained('bert-base-uncased', config=model_config) tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased') for i in sentences: tokenized_text = tokenizer.tokenize(i) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) tokens_tensor = torch.tensor([indexed_tokens]) model.eval() outputs = model(tokens_tensor) hidden_states = outputs[2] word_embed_6 = torch.cat([hidden_states[i] for i in [-1,-2,-3,-4]], dim=-1) return word_embed_6 Does the method return vectors for sub-word or word …

Topic: bert representation word-embeddings nlp

Category: Data Science

Can anyone please help me understand what is disentangled hierarchical representation?

Enthusiast

2022年1月18日 17:23

I understand that disentangled representations are those in which each dimension represents only one property. For example, say we have pictures of digits where latent space representation Z has Z1 representing which number, Z2 representing slant and Z3 thickness. However, I am not being able to understand what is disentanglement in case of hierarchical representation.

Topic: representation deep-learning

Category: Data Science

Can we consider Meta-features of a datasets as its embeddings?

Nitin Shravan

2021年10月6日 05:30

While reading some works on meta-learning. I had this doubt. Can we consider meta-features of a dataset as it's embedding ? Given the meta-feature is a lower dimensional representation which also try to retain properties of a dataset. Embeddings are essentially low dimension representation of some high dimensional concept. Is it fair to use "embeddings" instead of "meta-features" ? or can we use "representation" instead of "meta-features"

Topic: meta-learning representation embeddings data research

Category: Data Science

Representing user information

data101

2021年10月1日 18:02

I have a task of representing a users feature matrix , i have features like gender , age etc but I also have a multivalue feature called as "movies watched" which is essentially another table of movie names watched by that user with a numeric duration, the order of movies does not matter here. Also, movies watched can be from 20 movies to 300 movies. So what is the best way of representing this "movies watched" as a feature vector?

Topic: representation feature-engineering feature-construction

Category: Data Science

Understanding fastText

Mr.Robot

2021年7月24日 00:56

fastText is Facebook's open source software to obtain word embeddings (the original paper). Given a document indexed by $n$ and represented by list of n-gram vectors $\{x_1, x_2,\cdots, x_N\}$, the objective their system trying to optimize is $$ -\frac{1}{N} \sum_{n=1}^N y_n \log(f(BA x_n)) $$ where $B$ and $A$ are weight matrix factorized for performance consideration, $y_n$ is the class label, and $f(\cdot)$ is the softmax function. Despite the empirical gains reported in the paper, I find this formulation quite unusual …

Topic: representation word-embeddings nlp

Category: Data Science

How to represent a "switch"-like behavior in a neural network?

Lei

2021年6月25日 22:06

I have three input variables $x_1$, $x_2$ and $d$, where $x_1$ and $x_2$ are numerical variables and $d$ is a dummy variable that takes the value of 1 or 2. How to represent the part of a neural network in the black box so that when $d=1$, $x_1$ and $x_2$ are sent to layer $T_1$ for transformation, and when $d=2$, $x_1$ and $x_2$ are sent to layer $T_2$ for transformation?

Topic: representation neural-network

Category: Data Science

How to JUST represent words as embeddings by pretrained BERT?

taciturno

2021年4月17日 16:13

I don't have enough data (i.e. I don't have enough texts) --- have only around 4k words in my dictionary. I need to compare given words, then I need to representate it as embedding. After the representation of words I want to clusterize it, find similar vectors (i.e. words). Maybe even then make a classification to a given classes (classification there unsupervised --- since I don't have labeled data to train on). I know that almost any task can be …

Topic: bert representation unsupervised-learning word-embeddings nlp

Category: Data Science

About