dplyr

R code making 1 column into multiple columns with their unique ID

codingc0nfusions

2022年5月3日 23:55

Currently stuck on a data wrangling question in R. So far I've tried variations of this code using tidyverse package, columns 5 and 6 here were the rating and the user: df[,5:6] %>% pivot_wider(names_from = question, values_from = rating, names_sep = ".") %>% unnest(cols = everything())-> df_reformat Each column will be the question ID and the rows are the scores for each user, ideally clustered by group. Data structure needed: repID user Customer question 1 Customer question 2 .... Customer …

Topic: dplyr data-wrangling data-formats data-cleaning r

Category: Data Science

Group_by 2 variables and pivot_wider distribution based on 2 others

DataGuy23

2022年3月17日 10:00

Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum of all amounts for Drew ** Name Rating Amount Price Rate Type Direction Drew A 455 99.54 4.5 white POS Drew A 655 88.44 5.3 white NEG Drew B 454 54.43 3.4 blue NEU Drew B 654 33.54 5.4 …

Topic: groupby dplyr data-wrangling data-cleaning r

Category: Data Science

Mutate with dynamic column names dplyr

3nomis

2022年3月4日 23:54

Hi I have this dataset (It has many more columns) media brand radio tv cinema <chr> <dbl> <dbl> <dbl> <dbl> radio 0 0 0 0 tv 0 0 0 0 cinema 0 0 0 0 tv 0 0 0 0 radio 0 0 0 0 tv 0 0 0 0 I want to obtain the following(Assign a 1 to each column based on the value of media column): media brand radio tv cinema <chr> <dbl> <dbl> <dbl> <dbl> radio 0 …

Topic: dplyr preprocessing r machine-learning

Category: Data Science

Divide a column by itself with mutate_at dplyr

3nomis

2022年3月4日 23:49

Hi I'd like to turn each non zero value of my selected columns to a 1 using mutate_at() BRAND MEDIA_TYPE INV1 INV2 <chr> <chr> <dbl> <dbl> b1 newspapers 2 27 b1 magazines 3 0 b2 newspapers 0 0 b3 tv 1 145 b4 newspapers 4 40 b5 newspapers 5 0 b1 newspapers 1 0 b2 newspapers 0 28 The final result should be like follow: BRAND MEDIA_TYPE INV1 INV2 <chr> <chr> <dbl> <dbl> b1 newspapers 1 1 b1 magazines 1 …

Topic: dplyr dataframe programming r

Category: Data Science

Mutate with custom function in R does not work

Nuibb

2022年3月4日 21:40

I have a data frame, containing a column called: "Frequency". Frequency has values like "Year", "Week", "Month" etc. Now I want to create a new column based on the Frequency column where Year's new corresponding value will be 1, Month's corresponding value will be 12 and Week corresponding value will be 48. I tried to make a function for this as "getValue" and tried to make a new column applying a mutation (dplyr) on that funcion. But unfortunately I am …

Topic: dplyr dataframe r

Category: Data Science

More efficient way to create frequency column based on different groupings

Curt

2022年2月3日 21:12

I have code below that calculates a frequency for each column element (respective to it's own column) and adds all five frequencies together in a column. The code works but is very slow and the majority of the processing time is spent on this process. Any ideas to accomplish the same goal but more efficiently? Create_Freq <- function(Word_List) { library(dplyr) Word_List$AvgFreq <- (Word_List%>% add_count(FirstLet))[,"n"] + (Word_List%>% add_count(SecLet))[,"n"] + (Word_List%>% add_count(ThirdtLet))[,"n"] + (Word_List%>% add_count(FourLet))[,"n"] + (Word_List%>% add_count(FifthLet))[,"n"] return(Word_List) } ```

Topic: dplyr r efficiency

Category: Data Science

R error with filter: x comparison (1) is possible only for atomic and list types

NB3

2022年1月24日 12:18

I'm working through an error that I just can't seem to troubleshoot properly. I have a dataset from Stata that I'm working with in R. I have also converted it to a csv for my own different needs in different scripts. Currently working with the csv copy, I am running into an error that I do not experience from the .dta copy - any insight would be appreciated. I am trying to filter for only rows where the binary variable …

Topic: dplyr

Category: Data Science

Is it worth switching from dplyr code to data.table in r shiny that have various reactive data?

ViSa

2021年8月19日 08:10

I am sort of new in R & recently came across data.table package that is probably ~ 10 times faster than dplyr in various operations. I have a shiny app based on Covid data that is getting heavier each day & I am already not so impressed with the loading time & expecting it to get slow by each passing day. In shiny app I have provided several input options to user & hence the reactive elements and filter & …

Topic: dplyr data-table r

Category: Data Science

Flag consecutive dates by group

Marvin Aliaga

2021年5月14日 09:32

Below is an example of my data (Room and Date). I would like to generate variables Goal1 , Goal2 and Goal3. Every time there is a gap in the Date variable means that the room was closed. My goal is to identify consecutive dates by room. Room Date Goal1 Goal2 Goal3 1 Upper A 2021-01-01 1 2021-01-01 2021-01-02 2 Upper A 2021-01-02 1 2021-01-01 2021-01-02 3 Upper A 2021-01-05 2 2021-01-05 2021-01-05 4 Upper A 2021-01-10 3 2021-01-10 2021-01-10 5 …

Topic: dplyr dataframe r

Category: Data Science

Theoretical Question: Data.table vs Data.frame with Big Data

Bear

2021年1月21日 20:22

I know that I can read in a very large csv file much faster with fread using the data.table library than with read.csv that reads a file in as a data.frame. However, dplyr can only perform operations on data.frame. My questions are: Why was dplyr built to work with the slower of the two data structures? When working with big data is it good practice to read in as data.table then convert to data.frame to perform dplyr operations? Is there …

Topic: dplyr data-table dataframe r

Category: Data Science

How to add a column for descending row numbers into dataset in R

BioIsaac

2021年1月11日 01:44

I am new to R and would like to insert a new column that numbers the row to a large dataset. I have no idea how to use 'mutate()' to insert this. Would appreciate any help. Thanks.

Topic: dplyr dataset r

Category: Data Science

R: Rates of change from an initial value

Kota_K

2020年10月30日 01:04

I have a collection of csvs and must produce yearly rates of change per group within each csv, as well as a rate of change compared to the initial value. I am using the function below to calculate yearly rates of change, and it works fine through my loop. func <- function(x, n=1) { c(rep(NA, n), diff(x, n) / head(x, -1*n)*100) } df$RateOfChange <- ave(df$Data, factor(df$Group), FUN=func) Year Group Data RateOfChange 2010 1a 3.5 NA 2011 1a 4 14.29 2012 …

Topic: dplyr dataframe r

Category: Data Science

Find the mode value and frequency in R

DataGuy23

2020年10月30日 00:01

I'm trying to come up with a function in R that gives the mode value of a column along with the number of times (or frequency) that the value occurs. I want it to exclude missing (or blank) values, and treat ties by showing both values. When there are no repeating values I want it to return the first-appearing value that is found along with its frequency 1. "Name Color Drew Blue Drew Green Drew Red Bob Green Bob Green …

Topic: dplyr data statistics data-cleaning r

Category: Data Science

Weighted mean with summarise_at dplyr

3nomis

2020年8月10日 23:04

I strictly need to use the summarise_at to compute a weighted mean, with weights based on the values of another column df %>% summarise_at(.vars = vars(FACTOR,tv:`smart tv/console`), .funs = weighted.mean, w=INVESTMENT, na.rm=TRUE) It always shows the error: 'INVESTMENT' is not found. I then tried with: df %>%summarise_at(.vars = vars(FACTOR,tv:`smart tv/console`), .funs = weighted.mean, w=vars(INVESTMENT), na.rm=TRUE) But in this case : Evaluation error: 'x' and 'w' must have the same length. Why is this? Am I doing anything wrong? Do you …

Topic: dplyr dataset r data-mining

Category: Data Science

How to split data in R using dplyr if we want to have rows of the same group to belong to the same split?

Dee

2020年6月17日 17:21

In my current pipeline, I have sensed that there is data leakage. This is because the same person, though with slightly different values, is in both training and testing set. As a result, my model is overfitting. For eg my data looks like this: PID Var_1 Var_2 Person A 0 1 Person B 0 1 Person C 0 0 Person A 1 3 Person B 1 2 Person D 0 1 Person C 0 1 I want to split this …

Topic: dplyr data-cleaning r

Category: Data Science

How to create a barplot using multiple column

thoms

2020年6月10日 11:38

My data is about boardgames which contain 1 to many catgeories each like this : categorie1 categorie2 categorie3 Deduction Medieval Word Game Deduction Medieval NA Card Game Medieval Zombies Horror NA NA Horror Medieval Zombies I would like to create a barplot showing the most common categories accross games but i can't figure out how with multiple columns instead of one. Is there a dplyr method ?

Topic: dplyr ggplot2 r

Category: Data Science

Sort a data frame column based on another sorted column value in R

Sddr

2020年6月8日 06:43

I have a data frame that is sorted based on one column (numeric column) to assign the rank. If this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column. But to give rank I have to consider var2 that is the reason I sorted based on var2, if there is any identical values in var2 for those rows I have to consider …

Topic: dplyr r

Category: Data Science

R: Producing multiple plots (ggplot, geom_point) from a single CSV with multiple subcategories

Kota_K

2020年1月10日 00:13

I have a collection of bacteria data from approximately 140 monitoring locations in California. I would like to produce a scatterplot for each monitoring location with the Sampling Date on the Y-axis and the Bacteria Data on the X-axis. The Sampling Date, Bacteria Data, and Monitoring Location all reside within their own column. I've come up with the below code: ## Create List of Files ## filenames <- list.files(path = "C:\\Users\\...") ## Combine into one CSV ## All_Data <- ldply(filenames, …

Topic: dplyr ggplot2 r

Category: Data Science

R summarise with condition

nut get

2018年9月28日 13:42

I have customer data with the products they purchased and the purchase date. I want to extract a result that shows each customer and the first two fruits they purchased. My actual set has 90000 rows with 9000 unique customers. I have tried groupby and summarise functions but I would like to be able to use summarise with condition like we use select with a where clause. Thanks for your suggestions

Topic: dplyr r

Category: Data Science

Group_by field is not showing in the summarise output in R

snehal sharma

2018年5月7日 09:50

In R,using dplyr package, I tried the function "summarise" and I expect the result to show along with the groupby field. However, all of a sudden I see summarized output but without the groupby filed which makes the results meaningless. Any one any idea? For example if data has ages and salaries for members belonging to a team and I try to view the Average Age and Salary by Team the code below returns just the averages without the team …

Topic: dplyr r

Category: Data Science

About