faster alternatives to sparse.model.matrix?

I have a large dataset that is entirely categorical. I'm trying to train with it using xgboost, so I must first convert this categorical data to numerical. So far I've been using sparse.model.matrix() in the Matrix library but it is far too slow. I found a great solution here, however, the sparse matrix it returns in not the same one that sparse.model.matrix returns. I know there is a way to force sparse.model.matrix to return identical output as the solution in …
Category: Data Science

Using large CNNs (e.g., ResNet) in convolutional autoencoders for image representation learning

I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am …
Category: Data Science

A way to init sentence embedding for unsupervised text clustering, better than glove wordvec?

For unsupervised text clustering, the key thing is the init embedding for text. If we want to use deepcluster for text, the problem for text is how to get the init embedding from deep model. BERT can not get good init embedding. If we do not use deep model, is there better way to get embedding better than glove wordvec?
Category: Data Science

Are there any graph embedding algorithms like this already?

I wrote an algorithm for generating node embeddings based on the graph's topology. Most of the explanation is done in the readme file and the examples. The question is: Am I reinventing the wheel? Does this approach have any practical advantages over existing solutions for embeddings generation? Yes, I'm aware there are many algorithms for this based on random walks, but this one is pure deterministic linear algebra and it is quite simple, from my perspective. In short, the algorithm …
Category: Data Science

Is Self-Supervised Learning a task of Representation Learning?

Maybe a weird question but: Currently, I'm writing a seminar paper about Self Supervised Learning for time series data. For this paper, I have to find methods to prepare unlabelled time series data with SSL techniques to perform a classification task. In a scientific paper, I was able to find time series representation learning methods. Another SSL paper used one of those methods to do the classification on a specific dataset. Now I have to admit that I'm kind of …
Category: Data Science

Neural Nets: unordered sets of ordered tuples as features of data

I'm working on a very small scale pet project in which inputs are essentially sets of (x, y) pairs, and are to be classified into categories, using deep learning, specifically using Keras (I know this may not be the best for this, but it's more of a proof of concept / I want to try it out). However, I'm not sure how to go about representing the data. I'm starting with a simple classification problem (i.e. if (a, b) is …
Category: Data Science

Does feature engineering require absolute accuracy?

Sometimes when I'm studying the datasets, the text field is particularly challenging to handle. For whatever features I want to derive from the text fields, I try to apply some heuristic to approximate certain text patterns from the text fields so I can extract some features. (Think of those heuristics as some self-invented regex...). I'm concerned about the sanity of such heuristic: do people in practice also approximate and extract the features with heuristic only? (e.g. it may leads to …
Category: Data Science

KNN efficient implementation

The KNN algorithm is very handy and particularly suited to some of my problems, but I can't find any resources on how to implement it in production. As a comparative example, when I use a neural network, I already have at my disposal high-level tools allowing me to apply the neural network to examples (either library allowing me to smartly exploit the hardware of my devices when I want to do embedded, or infrastructures allowing me to use my neural …
Category: Data Science

Using categorical and continuous variables in Deep Learning

I would like to apply a MLP to some business seller data. I found that the data is a mix of both categorical and continuous features. For what I read it is not advisable to feed a neural network with both types of data (reference unknown/unavailable) and I remember that I read that one can use the following model: Categorical variables-->NN model 1 ----->NN model 3---->Output Continuous variables--->NN model 2 So in this model we have two neural networks that …
Category: Data Science

Good chromosome representation in a VRPTW genetic algorithm

I have a genetic algorithm for a vehicle routing problem with time windows and I need to implement certain modifications. I am not sure what would be the best chromosome representations. I have tasks which can be divided into 3 sub-tasks with certain ordered time windows, they have to processed in order and all 3 (they represent collecting certain goods in a storage, delivering them and returning packaging to another storage). In the algorithm crossover part these tasks are combined …
Category: Data Science

Why are wavelet transforms not scale-equivariant?

One can rely on continuous wavelets to build a multi-resolution analysis that is equivariant ("covariant") under the action of a discrete subgroup of translation. When not downsampled, the multi-resolution analysis of a 1D signal can be seen as a matrix of n x m coefficients, where n are the octaves that one wants to capture, and m are the number of considered translated wavelets on each octave. Equivariance to translation in this case means that a certain translation of the …
Category: Data Science

Representaion Learning - Self-supervision methods that do well with a limited amount of classes when

I understand that a contrastive learning approach such as SimCLR has an inherent problem when dealing with a low number of classes (let's say 2,3,5,6, maybe even 10). Problem is that the chances of picking a negative sample that has the same label as the image from the positive pair is not low (let's say a dog and another dog) Which contrastive learning approaches do better on such problems that we have let's say 4 classes rather than 1000 (or …
Category: Data Science

Output representation for a neural network to learn grid-based game with multiple units

I have a round based game played on a grid map with multiple units that I would like to control in some fashion using neural network (NN). All of the units are moved at once. Each unit can move in any of the grid map direction: $up$, $down$, $left$ and $right$. So if we have $n$ units then output policy vector of NN should have $n^4$ entries that represents probabilities, one for each move. Note that one move represents actions …
Category: Data Science

How can i get the vector of word using BERT?

I need to get word-vectors using BERT and got this function that i think it should be the one i need def get_bert_embed_matrix(sentences): device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model_config = transformers.AutoConfig.from_pretrained('bert-base-uncased', output_hidden_states=True) model = transformers.AutoModel.from_pretrained('bert-base-uncased', config=model_config) tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased') for i in sentences: tokenized_text = tokenizer.tokenize(i) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) tokens_tensor = torch.tensor([indexed_tokens]) model.eval() outputs = model(tokens_tensor) hidden_states = outputs[2] word_embed_6 = torch.cat([hidden_states[i] for i in [-1,-2,-3,-4]], dim=-1) return word_embed_6 Does the method return vectors for sub-word or word …
Category: Data Science

Can anyone please help me understand what is disentangled hierarchical representation?

I understand that disentangled representations are those in which each dimension represents only one property. For example, say we have pictures of digits where latent space representation Z has Z1 representing which number, Z2 representing slant and Z3 thickness. However, I am not being able to understand what is disentanglement in case of hierarchical representation.
Category: Data Science

Can we consider Meta-features of a datasets as its embeddings?

While reading some works on meta-learning. I had this doubt. Can we consider meta-features of a dataset as it's embedding ? Given the meta-feature is a lower dimensional representation which also try to retain properties of a dataset. Embeddings are essentially low dimension representation of some high dimensional concept. Is it fair to use "embeddings" instead of "meta-features" ? or can we use "representation" instead of "meta-features"
Category: Data Science

Representing user information

I have a task of representing a users feature matrix , i have features like gender , age etc but I also have a multivalue feature called as "movies watched" which is essentially another table of movie names watched by that user with a numeric duration, the order of movies does not matter here. Also, movies watched can be from 20 movies to 300 movies. So what is the best way of representing this "movies watched" as a feature vector?
Category: Data Science

Understanding fastText

fastText is Facebook's open source software to obtain word embeddings (the original paper). Given a document indexed by $n$ and represented by list of n-gram vectors $\{x_1, x_2,\cdots, x_N\}$, the objective their system trying to optimize is $$ -\frac{1}{N} \sum_{n=1}^N y_n \log(f(BA x_n)) $$ where $B$ and $A$ are weight matrix factorized for performance consideration, $y_n$ is the class label, and $f(\cdot)$ is the softmax function. Despite the empirical gains reported in the paper, I find this formulation quite unusual …
Category: Data Science

How to represent a "switch"-like behavior in a neural network?

I have three input variables $x_1$, $x_2$ and $d$, where $x_1$ and $x_2$ are numerical variables and $d$ is a dummy variable that takes the value of 1 or 2. How to represent the part of a neural network in the black box so that when $d=1$, $x_1$ and $x_2$ are sent to layer $T_1$ for transformation, and when $d=2$, $x_1$ and $x_2$ are sent to layer $T_2$ for transformation?
Category: Data Science

How to JUST represent words as embeddings by pretrained BERT?

I don't have enough data (i.e. I don't have enough texts) --- have only around 4k words in my dictionary. I need to compare given words, then I need to representate it as embedding. After the representation of words I want to clusterize it, find similar vectors (i.e. words). Maybe even then make a classification to a given classes (classification there unsupervised --- since I don't have labeled data to train on). I know that almost any task can be …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.