sparse

Predicting sparse time series data

Ruben Kl

2022年5月26日 12:05

I have a dataset of a couple of EV charging stations (10 min frequency) over 1 year. This data consists of lots of 0s, since there is no continuous flow of cars coming to charge, but rather reoccurring charging events as peaks (for example from 7-9am seems to be a frequent charging timeframe when people are coming to the office). I have also aggregated weather and weekday/holiday data to be used as features. I now wish to predict the energy …

Topic: sparse forecasting time-series machine-learning

Category: Data Science

Dictionary learning for image classification

Jakub Małecki

2022年5月23日 15:00

I'm wondering if the approach I'm thinking of could even work. I want to use dictionary learning for image classification. The first step would be to learn the dictionary from a set of similar yet different images to be able to extract background from an image. For example, I have a set (e.g. 500 photos) of images of the same object, but the scenes differ (light, the angle the photo was taken at etc.) Basically, the main object is the …

Topic: sparse image-classification

Category: Data Science

Convert Pandas Dataframe with mixed datatypes to LibSVM format

learner

2022年4月24日 03:01

I have a pandas data frame with about Million rows and 3 columns. The columns are of 3 different datatypes. NumberOfFollowers is of a numerical datatype, UserName is of a categorical data type, Embeddings is of categorical-set type. df: Index NumberOfFollowers UserName Embeddings Target Variable 0 15 name1 [0.5 0.3 0.2] 0 1 4 name2 [0.4 0.2 0.4] 1 2 8 name3 [0.5 0.5 0.0] 0 3 10 name1 [0.1 0.0 0.9] 0 ... ... .... ... .. I would …

Topic: sparse scikit-learn pandas libsvm

Category: Data Science

Predicting high frequency sparse time series data in python

Ruben Kl

2022年3月8日 03:06

I have a dataset of a couple of EV charging stations (10 min frequency) over 1 year. This data consists of lots of 0's, since there is no continuous flow of cars coming to charge but rather reoccurring charging events as peaks(for example from 7-9 am seems to be a frequent charging timeframe when people are coming to the office) I have also aggregated weather and weekday/holiday data to be used as features. I now wish to predict the energy …

Topic: sparse forecasting prediction python machine-learning

Category: Data Science

Regression with high dimensional sparse boolean predictors

Michael C

2022年2月24日 13:59

Suppose I have a continuous y response variable and a very large matrix of boolean sparse predictor variables X. What would be the best regression method to use?

Topic: sparse regression

Category: Data Science

What exactly is activity sparsity and why is it beneficial?

Luuk

2022年2月3日 14:58

I have been reading about weight sparsity and activity sparsity with regard to convolutional neural networks. Weight sparsity I understood as having more trainable weights being exactly zero, which would essentially mean having less connections, allowing for a smaller memory footprint and quicker inference on test data. Additionally, it would help against overfitting (which I understand in terms of smaller weights leading to simpler models/Ockham's razor). From what I understand now, activity sparsity is analogous in that it would lead …

Topic: sparse sparsity cnn regularization

Category: Data Science

Suggestions for binary time-series-classification model for small dataset

sensation96

2021年11月28日 21:39

Hopefully I´m at the right place for my question: I´m looking for suggestions for models to use to classify multivariate time series. I´m trying to find a way of classifying the behaviour of motors into "good" or "bad" based on current measurments. I found many possible examples (as found for example in the library sktime) to use, but my biggest problem is that the dataset I have captured is incredibly small because of difficulties in the testing environment. The dataset …

Topic: binary-classification sparse classification time-series

Category: Data Science

Best metric and hyperparameters in dimension reduction with UMAP for binary sparse data

linello

2021年7月27日 14:47

I am playing with a dimensionality reduction step prior to clustering for a pretty large sparse binary matrix of almost 3000 columns and 50k rows. My idea is to embed the 3000 dimensions into a two-dimensional space with UMAP and then cluster the resulting 50,000 two-dimensional points with HDBScan. I've found that UMAP accepts a number of options, such as the metric, n_neighbors, min_dist and spread, but I cannot figure out what should be the best combination giving me distinct …

Topic: sparse binary dimensionality-reduction

Category: Data Science

Linear regression on sparse matrix?

Hanson

2021年5月17日 14:15

I have a matrix with sparse data. A small extract from it is seen below. The columns represent years and the rows represent different race tracks. The feature values are velocities on that specific track a specific year. Generally the velocity increases with the year but that is not necessarily true. As seen below the matrix is sparse and for some tracks I only have values for a single year. How can one most accurately predict the missing values? I …

Topic: sparse linear-regression machine-learning

Category: Data Science

How to create a big data frame in Python

2021年4月23日 14:41

I have a sparse matrix, $X$, created by TfidfVectorizer and its size is $(500000, 200000)$. I want to convert $X$ to a data frame but I'm always getting a memory error. I tried pd.DataFrame(X.toarray(), columns=tokens) and pd.read_csv(X.toarray().astype("float32"), columns=tokens, chunksize=...). And it seems that when I convert $X$ to a numpy array using X.toarray(), I get an error. Can someone tell me what is an easy solution for this? Is there anyway I can create a sparse dataframe from $X$ without …

Topic: sparse dataframe tfidf python

Category: Data Science

How to work with input which is a combination of metadata+ vectorized text data + image pixel data to build a Regression Model (predict views)?

Mathew

2021年3月23日 14:57

There are 4 datasets (all in csv format), each has a uniqueID column by which each record can be identified. Image and text datasets are dense datasets.(need to be converted to ndarray). Can someone suggest how to use all these 4 datasets for building a regression model? This is how the datasets look, Metadata having some input features and target variable(views) uniqueID ad_blocked embed duration language hour views 1 True True 68 3 10 244 2 False True 90 1 …

Topic: sparse regression metadata nlp python

Category: Data Science

Anomaly detection on sparse categorical data

DataLover

2021年3月16日 16:38

I have a big dataset with a column "clientid" and a categorical column "choice". I want to find out what are the clients that have strange combinations of choices (less frequent ones) and being able in the future to identify new strange combinations of future clients immediately. clientid choice cl1 a cl2 b cl2 c cl3 d cl4 b cl4 c If I transpose the table by clientID I have a row for each client and different columns based on …

Topic: sparse unsupervised-learning anomaly-detection categorical-data

Category: Data Science

Why do my target labels need to begin at 0 for sparse categorical cross entropy to work?

TomSelleck

2021年3月14日 16:16

I'm following a guide here to implement image segmentation in Keras. One thing I'm confused about are these lines: # Ground truth labels are 1, 2, 3. Subtract one to make them 0, 1, 2: y[j] -= 1 The ground truth targets are .png files with either 1,2 or 3 in a particular pixel position to indicate the following: Pixel Annotations: 1: Foreground 2:Background 3: Not classified When I remove this -1, my sparse_categorical_crossentropy values come out as nan during …

Topic: sparse cnn keras python categorical-data

Category: Data Science

Correlation/distance between sparse vectors

Roger Vadim

2021年1月20日 13:52

I am looking for a metric for comparing gene count tables. These are long columns of data (a few millions genes by a few dozen samples), with all non-negative entries, about 90% of which are zeros. The goal is to compare the performance of several tools/algorithms that these tables originate from, by comparing the resulting tables among themselves or with the expected counts (in a case of sumulates data). In principle, one compares on a sample-by-sample basis, but comparing different …

Topic: sparse spearmans-rank-correlation distance correlation

Category: Data Science

unsupervised anomaly detection on sparse data

greghouse1

2020年11月25日 11:03

Given that I have a very sparse data matrix with continuous features, like this dataframe for example Feature_A Feature_B Feature_C....Feature_Z 0.3 0 0.1 0 0.5 0.5 0 0 0 0 1.0 0 1.0 0 0 0 0.7 0 0 0 1.0 0 0 0 0.1 0 0.22 0.43 what is the best way to perform unsupervised anomaly detection on this kind of data? my initial idea was to perform some kind of dimensionality reduction first (e.g SVD or NMF) then …

Topic: sparse unsupervised-learning anomaly-detection categorical-data machine-learning

Category: Data Science

Softmax regression cost function code

Mostafa Atallah

2020年10月19日 18:40

I really do not understand what does this code do M = sparse.coo_matrix(([1]*n, (Y, range(n))), shape=(k,n)).toarray() The code is related to calculating the sparse function in this equation, but I am really confused and I do not know how it iterates through it and what is: 1- sparse.coo_matrix 2- (Y, range(n))) 3-shape=(k,n)).toarray() ?? Also, What exactly does this term means in the equation and how to interpret it into code: Thank you , and please forgive my poor English.

Topic: sparse softmax multilabel-classification scipy machine-learning

Category: Data Science

Autoencoder for Extremely Sparse Data

PDPDPDPD

2020年10月15日 05:05

I am attempting to train an autoencoder on data that is extremely sparse. Each datapoint is only zeros and ones and contains ~3% 1s. Being that the data is mostly zero the autoencoder learns to guess zero every time. Is there a way to prevent this from happening? To give context this is extremely sparse data when you consider that the number of features is over 865,000

Topic: sparse pytorch autoencoder machine-learning

Category: Data Science

Classifying sparse binary data for int value

abdus_salam

2020年10月6日 23:14

I'm very new to data science and still trying to get the grips. The problem I'm trying to tackle is, we have a pool of footballers from a league and data objects representing a group of 11 footballers for a given match and the number of goals scored by that team on that match. The goal is to estimate the number of goals that are potentially to be scored given any random line up of footballers from this pool. This …

Topic: sparse classification

Category: Data Science

Sparse Covariance Selection

mbz0

2020年9月1日 11:54

I was reading this article https://www.di.ens.fr/~aspremon/PDF/CovSelSIMAX.pdf, whose goal is to estimate the covariance matrix from a the sample covariance matrix drawn from a distribution $X$. ' Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables.' The likelihood problem is only for the case where …

Topic: estimation sparse matrix

Category: Data Science

Custom Loss Function for Mixing Sparse and Dense Features for a Prediction Problem

Recyclops

2020年8月2日 02:22

I have a largely uncorrelated feature space of about 40 dichotomous features, using which I'm trying to predict a continuous target variable. Now, some of these features are very sparse (Active less than 10% of the time, with the rest as zeros). But the few times that these features are active may be really good predictors of the target. In most algorithms, these features will be mostly ignored due to how sparse they are - despite their predictive ability. What …

Topic: sparse bayesian loss-function predictive-modeling machine-learning

Category: Data Science

About