aggregation

Feature Selection on Aggregated Targetdata

Alexander Fratzer

2022年5月21日 11:03

I have a question about feature selection on a dataset where the target variable is aggregated by the sum of different data points. I want to predict the number of sales depending on a variety of features like: week price per unit store country store city 2-3 other categorical meta-data other features I am aware that this data should be interpreted as time series but because of the lack of available historical data, no model can compete with the naive …

Topic: aggregation time-series feature-selection machine-learning

Category: Data Science

Supervised learning on sources of information with different importance

Toby

2022年5月10日 19:01

I am trying to classify customer support sessions using supervised machine learning. In each customer support session I have 3 bags of information. 1. The title of the customer's complaint 2. Information about the device the customer was using 3. Text of the chat session with the customer support agent In each customer support session, there are 6 different classes. Is it better to: 1. Train a classifier on each bag of information and have them vote on which class …

Topic: aggregation feature-selection nlp machine-learning

Category: Data Science

Learning the Average of a 0/1 Dependent Variable

Ami Tavory

2022年5月10日 07:11

uppose I have a matrix and a dependent vector whose entries are each in {0,1} dependent on the corresponding row of Given this dataset, I'd like to learn a model, so that given some other dataset ′, I could predict average(′) of the dependent-variable vector ′. Note that I'm only interested in the response on the aggregate level of the entire dataset. One way of doing so would be to train a calibrated binary classifier →, apply it to ′, …

Topic: theory aggregation classification

Category: Data Science

How can we predict a value after several rows of data?

Aneeq

2022年5月4日 18:48

I have a regression problem in which for each week I have several rows (variable between rows i.e 1 week might have 1800 rows and other might have 5000 rows). My target is to predict a value at end of each week's data. Here's an example of what I need : x is a feature y is the target. Week 1 ; x1, x2, x3.. x90 Week 1 ; v1, v2, v3... v90 .... 100 more rows Week 1 ; …

Topic: multi-instance-learning aggregation time-series data-cleaning machine-learning

Category: Data Science

How to deal with a potencially multiple categorical variable

Diogo Santos

2022年4月1日 17:05

I'm build a model that has, as inputs, some categorical variables. I had already dealt with this sort of data before, and applied different techniques as creation of dummy variables and factor scoring. However, I have now a different type of problem which I can not see the obvious best answer to. For each individual we can have multiple instances of this categorical variable $X$. When such cases happen on numerical variables I usually take the max/mean/min depending on context. …

Topic: dummy-variables feature-engineering aggregation categorical-data

Category: Data Science

I want to be able to collapse and sum values dependent on the gene name

Mark Hickling

2022年2月8日 02:05

I have a table that looks like this: I want add together all the values for each gene for each column. For example, for LINC01128, it should read: ConN1 ConN2 ConN3 StN1 StN2 StN3 LINC01128 : 22 14 37 34 54 67 My table is very long and this would need to be done for all the genes.

Topic: aggregation r

Category: Data Science

Tableau: keeping results independent of view / filter

yurnero

2021年12月27日 05:04

I am using Tableau Desktop 2021.1.4 Suppose that my source sales data consists of 4 columns Region (dimension with values: N,E,W,S), Type (dimension with values: Furniture, Electronics, Appliances), Year (dimension with values: 2021, 2020, 2020), and sales ($). I would like to generate a Calculated Field, say "Sum of Sales", where the summation: is always over all the regions and all the types, regardless of what is in the view can also be over the different years or can be …

Topic: aggregation tableau

Category: Data Science

Aggregating transactional data for customer segmentation

user1636588

2021年11月12日 15:39

I have item-level transactional data where each row in the data represents a different item bought by a customer in a transaction (so if two different items were bought in the same transaction by the same customer there would be two rows where the customer_id and the transaction_id columns have the same value) Eg: Customer_id transaction_id item_bought quantity a 00001 cheese 2 b 00002 ham 1 b 00002 pepsi 2 In this case customer b bought two items in the …

Topic: machine-learning-model aggregation data-cleaning clustering machine-learning

Category: Data Science

Python Pandas agg error

4Walk

2021年8月9日 02:04

I am trying to generate descriptive statistics using agg function in Pandas. I am having trouble with one line with a lambda function. They work when I run them as separate lines of code, but when I put them as a single line I get errors. Any guidance is much appreciated. The following two lines of codes work when I run them individually. First line of code: bh_df.groupby('CAT.MEDV').agg( avg_Nox=('NOX', 'mean')) Second line with lambda function. bh_df.groupby('CAT.MEDV').agg( rng=("NOX", lambda x: (max(x) …

Topic: python-3.x aggregation pandas

Category: Data Science

Labeling and aggregating features issue

cryp

2021年8月6日 21:40

I am trying build a simple binary classifier (some tree based algorithm for now) and my training data will have features aggregated at the user level. So I'll have a unique records of each user. These aggregated features are like "number of logged in sessions", "number of times profile button was clicked" etc - essentially these are website browse behavior features. What I am trying to predict is if someone would be interested in subscribing or not. Some users might …

Topic: labels aggregation xgboost random-forest predictive-modeling

Category: Data Science

Concatenating Data in two years

W.314

2021年7月26日 09:05

I have to use a Machine Learning Model to predict the Electricity consumption and carbon emission based on some buildings' features. (Area, year of construction ...) Here is the link to the data. The problem is that I have data from 2 years 2015 and 2016, for each year I have some buildings and the mean of consumption and emission. I'm wondering what is the best way to concatenate the data. Since there are some buildings that are registered only …

Topic: aggregation dataset data-cleaning

Category: Data Science

How do you aggregate features of lists (pooling alternatives)?

janniks

2021年7月13日 14:27

Is it possible to reduce non-correlated multi-dimensional data over features to 1D data? A working option is pooling (mean/min/max) over an embedding vector (n samples of embeddings of m dimensions). E.g. converts many embeddings (n × m) to a list of means (1 × m). However, these all loose a lot of information (especially the relationships between features in single embeddings). This doesn't have to be a reduction (i.e. the resulting 1D vector can be larger than m). If it's …

Topic: pooling feature-engineering aggregation feature-construction dimensionality-reduction

Category: Data Science

How to aggregate data inserted by users to avoid outliers?

zzzz

2021年5月3日 20:28

I'm developing a new application based on machine learning. In this application users can insert new data to improve the prediction system. As you may guess, users could insert data that doesn't make sense, generating in this way outliers that may harm the prediction accuracy. I'm pretty new to this field so I would like to ask you: do you know any strategy to mitigate this? Maybe by implementing a voting or aggregating system? In that case, do you have …

Topic: aggregation data outlier

Category: Data Science

MongoDB Groupby Rank

Noob

2021年3月22日 14:28

Im Working With Mongodb And Wanted to do a query using Aggregate fucntion. Query Is Each city has several zip codes. Find the city in each state with the most number of zip codes and rank those cities along with the states using the city populations. The documents are in the following format { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] } I was able to count no of Zipcodes for each state …

Topic: groupby python-3.x aggregation ranking mongodb

Category: Data Science

Aggregating standard deviations

Nakeuh

2021年3月7日 07:33

Imagine I have a collection of data, let's say the travel time for a road segment. On this collection I want to calculate the mean and the standard deviation. Nothing hard so far. Now imagine that instead of having my collection of values for one road segment, I have multiple collections of values that correspond to the multiple sub segments that compose the road segment. For each of these collections, I know the average and the standard deviation. From that, …

Topic: aggregation

Category: Data Science

Using R to organize/rearrange CSV - group by multiple columns?

SpookyDLX

2021年2月10日 13:22

I have a CSV that I need to clean up / organize in a usable way using R. I need to group by the property ID and then want to take all the unique years for the defor year column and make each year into a sperate column with the amount of deforestation for that year. My data frame / CSV looks like this: Prop_ID deforYear deforHA 1 2010 15 1 2011 0 1 2012 10 2 2010 35 2 …

Topic: aggregation r

Category: Data Science

R: Calculations based on frequencies / grouped / aggregate data

joffdd

2021年2月1日 20:11

I am trying to do simple calculations in R when no raw data but grouped data with frequencies is available only. This is the case when I have a large amount of records in a database, say a large SQL table, and then for given reasons GROUP BY and COUNT to aggregate instead of downloading the original table for analysis in R. As I understand, one could say in R that I'm talking about data in a table format. To …

Topic: data-table aggregation r

Category: Data Science

How to get a (descriptive) overview of a large database?

Ben

2020年12月9日 20:08

I'm facing a data framework with ~ 20 k observations and 151 variables across 2078 subjects At first I am primarily interested in how the data looks like related to a single parameter. But I cannot plot 2078 subjects on the x-axis and make a bar plot out of it or so. What would be useful methods for such a situation? I prefer some visualizations but I think they won't be applicable. I'm afraid even non-visualization methods are not really …

Topic: aggregation descriptive-statistics ggplot2 visualization r

Category: Data Science

Heatmap of large 2D array using datashader and plotly

andins

2020年11月25日 13:08

I’m trying to show a heatmap of a large 2D array (160x250000 entries). This should go into a dash app so I'm using plotly to deal with graphics and my idea was to use datashader for performance but I’m having troubles getting it right. However, independently of dash I’m already having problem with plotly + datashader (see code below). There is probably something very basic I’m not understanding in this process. It would be great if someone could tell me …

Topic: plotly heatmap aggregation python

Category: Data Science

Data system that manages aggregates over time intervals

naftalimich

2020年10月21日 00:23

I am looking to know if there is a data system that handles the following use case. To keep it simple, the data is a set of homogeneous enties E. E contains named numeric properties that the app code increments as the case may be over the life cycle of the application. I will want to query the state of E for a set of time intervals. To keep it simple, let's say this is today, this month, this year …

Topic: data-analysis aggregation

Category: Data Science

About