data-table

Identifying patterns in tabular data

Bernhard

2022年3月21日 23:03

I have a set of tables containing some thousand entries and some tenths of columns from machine status values of production. The entries are of mixed types like string, float, or timestamp. Each table is pre-labeled with a certain failure mode (e.g. valve setting jump, the problem with inlet A, etc.). This could be due to a jump in the mean values in some columns or a special correlation between several columns. This is what I refer to as a …

Topic: pattern-recognition data-table

Category: Data Science

Look for previous date in dataframe that has certain column category in R

alvaropr

2022年2月27日 01:01

I have the following data frame: Date.POSIXct Date WeekDay DayCategory Hour Holidays value 1 2018-05-01 00:00:00 2018-05-01 MA MA-MI-JU 0 0 30 2 2018-05-01 01:00:00 2018-05-01 MA MA-MI-JU 1 0 80 3 2018-05-01 02:00:00 2018-05-01 MA MA-MI-JU 2 0 42 4 2018-05-01 03:00:00 2018-05-01 MA MA-MI-JU 3 0 90 5 2018-05-01 04:00:00 2018-05-01 MA MA-MI-JU 4 0 95 6 2018-05-01 05:00:00 2018-05-01 MA MA-MI-JU 5 0 5 DayCategory groups days of the week in the following way: Mondays goes to …

Topic: data-table dataframe r

Category: Data Science

How to create table for reporting ANOVA results

mały_statystyczny

2021年10月24日 21:16

I would like to export tables for the following result for a repeated measure anova: Here the function which ANOVA test has been implemented fAddANOVA = function(data) data %>% ezANOVA(dv = .(value), wid = .(ID), within = .(COND)) %>% as_tibble() And here the commands to explore ANOVA statistics aov_stats <- df_join %>% group_by(signals) %>% mutate(ANOVA = map(data, ~fAddANOVA(.x))) %>% dplyr::select(., -data) %>% unnest(ANOVA) > aov_stats # A tibble: 12 x 4 # Groups: signals [12] signals ANOVA$Effect $DFn $DFd $F …

Topic: anova word data-table r

Category: Data Science

tableau view with dimensions totals, static values and calculation

Jaycreation

2021年10月20日 04:55

I try to create a table like the one following. I got 1 line product, 1 dimension "zones" and I want to add few columns : total for pieces on a line, a fixe value for a product(stock mini), a calculated field stock mini - total an an icone to tell me if the sctock mini - total is > or < to 0 After struggling I manage to create my columns, but I found no solution to add a …

Topic: data-table tableau visualization

Category: Data Science

Having issues with DataSet

user125412

2021年9月18日 11:59

I just started learning data science and am having a problem when generating a dataset. Dataset: covid_data=pd.read_csv(r"C:\Users\Test\OneDrive\Desktop\Project_test\data.csv") For some reason when I try to create a new dataset it creates an additional column "cases)" and adds NaN values automatically. It happens randomly, it works for a while and when I restarted my jupyter notebook then it happened again. Any idea how to prevent this issue? I obtain the dataset from https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv/data.csv Screenshot of the data.csv file:

Topic: data-table dataframe dataset

Category: Data Science

Is it worth switching from dplyr code to data.table in r shiny that have various reactive data?

ViSa

2021年8月19日 08:10

I am sort of new in R & recently came across data.table package that is probably ~ 10 times faster than dplyr in various operations. I have a shiny app based on Covid data that is getting heavier each day & I am already not so impressed with the loading time & expecting it to get slow by each passing day. In shiny app I have provided several input options to user & hence the reactive elements and filter & …

Topic: dplyr data-table r

Category: Data Science

How to Write Multiple Data Frames in an Excel Sheet

Della

2021年8月2日 07:23

I have multiple data frames with same column names. I want to write them together to an excel sheet stacked vertically on top of each other. And between each, there will be a text occupying a row. This is what I have in mind. I tried the pandas.ExcelWriter() method, but each dataframe overwrites the previous frame in the sheet, instead of appending. Note that, I still need multiple sheets for different dataframe, but also multiple dataframes on each sheet. Is …

Topic: data-table dataframe excel pandas

Category: Data Science

Pandas: Group by Single Column Entries

J. Doe

2021年4月24日 23:26

So have this table above. I'm trying to aggregate the occupations such that the table results in: I've tried using df.groupby(['Occupation']) but I get an error. All I know is that my final step would be to set the index to "Occupation". But I still don't know how to group via entries in the single Occupation column here. Also, what type of table would the final table be name/called? I know it's not called a mutiindex table because there is …

Topic: data-table pandas dataset python

Category: Data Science

Is pandas now faster than data.table?

xiaodai

2021年3月14日 18:16

Here is the GitHub link to the most recent data.table benchmark. The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used Python before but would consider switching if pandas can beat data.table?

Topic: data-table data pandas python r

Category: Data Science

Software for automated database processing

Fedor Alexandrovich

2021年2月22日 21:03

I faced a problem which I'd like to solve w/o any programming. And looking for a software to do this. I have a dataset, for example: (brand-id, brand-name, product-class-name;) 0, Audi, economy business premium; 1, Rolls Royce, luxury; 2, Seat, economy; 3, Tesla, business premium; And I'd like to automatically process this dataset, resulting in creating an additional table to classify parameters in column 3, like: (product-class-id, product-class-name, brand-id;) 0, economy, 0 2; 1, business, 0 3; 2, premium, 0 …

Topic: data-table data software-recommendation dataset bigdata

Category: Data Science

Table function output and order of arguments

Darrin Thomas

2021年2月1日 20:12

I have a silly question. Below is the output of a logistic regression analysis I did. I notice that when I switch the order of the arguments I put in the table function in R that it also switch the false positives and the false negatives values but it did not switch the the location of the Female and Male rows and columns. This to me seems like it could really affect the interpretation if the false positives/negatives can change …

Topic: data-table logistic-regression

Category: Data Science

R: Calculations based on frequencies / grouped / aggregate data

joffdd

2021年2月1日 20:11

I am trying to do simple calculations in R when no raw data but grouped data with frequencies is available only. This is the case when I have a large amount of records in a database, say a large SQL table, and then for given reasons GROUP BY and COUNT to aggregate instead of downloading the original table for analysis in R. As I understand, one could say in R that I'm talking about data in a table format. To …

Topic: data-table aggregation r

Category: Data Science

rows to columns in data.table R (or Python)

cpumar

2021年2月1日 20:11

This is something I can't achieve with the reshape2 library for R. I have the following data: zone code literal 1: A 14 bicl 2: B 14 bicl 3: B 24 calso 4: A 51 mara 5: B 51 mara 6: A 125 gan 7: A 143 carc 8: B 143 carc i.e.: each zone has 4 codes with its corresponding literal. I would like to transform it to a dataset with one column for each of the four codes …

Topic: reshape data-table dataset python r

Category: Data Science

How to make smaller categories with factor character variables

Beharrlich

2021年2月1日 20:11

I have this data set with consist of ISO3166 Alpha-2 codes for countries. Example: DE, AD, AE etc They are coded as factor variables in R and there are about 173 observations. Now because there are too many and this would just overwhelm a boxplot, I want to make a contingency table with other variables by condensing the codes and create shorter categories (also coded as factors) with the codes, for example, having DE, RE, ED, FR-> Europe CA, US-> …

Topic: data-table rstudio dataset data-cleaning r

Category: Data Science

Theoretical Question: Data.table vs Data.frame with Big Data

Bear

2021年1月21日 20:22

I know that I can read in a very large csv file much faster with fread using the data.table library than with read.csv that reads a file in as a data.frame. However, dplyr can only perform operations on data.frame. My questions are: Why was dplyr built to work with the slower of the two data structures? When working with big data is it good practice to read in as data.table then convert to data.frame to perform dplyr operations? Is there …

Topic: dplyr data-table dataframe r

Category: Data Science

Mean across every several rows in pandas

Mirit

2021年1月21日 20:21

I have a table of features and labels where each row has a time stamp. Labels are categorical. They go in a batch where one label repeats several times. Batches with the same label do not have a specific order. The number of repetitions of the same label in one batch is always the same. In the example below, every three rows has the same label. I would like to get a new table where Var 1 and Var 2 …

Topic: data-table data-wrangling sql pandas python

Category: Data Science

How do you see the element of a csv table with many columns (>30) which the names of its columns is more than 10 character in pandas?

user10296606

2021年1月16日 04:09

How do you see in pandas the element of a csv table with many columns (>25) which the names of its columns is more than 10 character? I have 5000 rows and 32 columns and the label of some columns are more than 10 characters. How I ca see them and work with different columns? Excel does not work! All of the items are sloppy Access is OK but could not detect the long labels of items! What is your …

Topic: data-table multiclass-classification csv pandas python

Category: Data Science

Calculating % by Dividing Filtered Matrix Columns in MS Power BI

V_B

2021年1月16日 04:09

Given: A monthly percentage (%) metric has to be calculated from dividing a column ('Numerator') from one table by a column ('Denominator') from another table, both filtered by month, as given in an example below: Table 1: Date_1 Numerator 01-Jan-19 5 05-Feb-19 4 04-Apr-19 1 07-May-19 3 11-Jun-19 5 22-Jun-19 4 25-Jul-19 5 31-Aug-19 1 03-Sep-19 4 25-Oct-19 5 Table 2: Date_2 Denominator 03-Jan-19 7 05-Jan-19 9 16-Feb-19 8 22-Feb-19 7 04-Mar-19 10 18-Mar-19 8 24-Apr-19 8 25-Apr-19 8 01-May-19 …

Topic: data-table powerbi

Category: Data Science

Are there decisive leaders in programming with tabular data?

Monolithguy

2021年1月16日 04:08

What are the most effective bread-and-butter in-memory open source tabular data frameworks today? I have been working with tabular data for years with an in-house solution that integrates with Excel well, but falls short of many other expectations. I would like to (if possible/true) demonstrate that our solution has fallen behind the times. In other words, assuming an SQL-like platform is responsible for persistence of a data set, but cycle intensive calculations need to be performed on that dataset (E.g. …

Topic: data-table sql pandas nosql

Category: Data Science

About