R code making 1 column into multiple columns with their unique ID

Currently stuck on a data wrangling question in R. So far I've tried variations of this code using tidyverse package, columns 5 and 6 here were the rating and the user: df[,5:6] %>% pivot_wider(names_from = question, values_from = rating, names_sep = ".") %>% unnest(cols = everything())-> df_reformat Each column will be the question ID and the rows are the scores for each user, ideally clustered by group. Data structure needed: repID user Customer question 1 Customer question 2 .... Customer …
Category: Data Science

Group_by 2 variables and pivot_wider distribution based on 2 others

Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum of all amounts for Drew ** Name Rating Amount Price Rate Type Direction Drew A 455 99.54 4.5 white POS Drew A 655 88.44 5.3 white NEG Drew B 454 54.43 3.4 blue NEU Drew B 654 33.54 5.4 …
Category: Data Science

Mutate with dynamic column names dplyr

Hi I have this dataset (It has many more columns) media brand radio tv cinema <chr> <dbl> <dbl> <dbl> <dbl> radio 0 0 0 0 tv 0 0 0 0 cinema 0 0 0 0 tv 0 0 0 0 radio 0 0 0 0 tv 0 0 0 0 I want to obtain the following(Assign a 1 to each column based on the value of media column): media brand radio tv cinema <chr> <dbl> <dbl> <dbl> <dbl> radio 0 …
Category: Data Science

Divide a column by itself with mutate_at dplyr

Hi I'd like to turn each non zero value of my selected columns to a 1 using mutate_at() BRAND MEDIA_TYPE INV1 INV2 <chr> <chr> <dbl> <dbl> b1 newspapers 2 27 b1 magazines 3 0 b2 newspapers 0 0 b3 tv 1 145 b4 newspapers 4 40 b5 newspapers 5 0 b1 newspapers 1 0 b2 newspapers 0 28 The final result should be like follow: BRAND MEDIA_TYPE INV1 INV2 <chr> <chr> <dbl> <dbl> b1 newspapers 1 1 b1 magazines 1 …
Category: Data Science

Mutate with custom function in R does not work

I have a data frame, containing a column called: "Frequency". Frequency has values like "Year", "Week", "Month" etc. Now I want to create a new column based on the Frequency column where Year's new corresponding value will be 1, Month's corresponding value will be 12 and Week corresponding value will be 48. I tried to make a function for this as "getValue" and tried to make a new column applying a mutation (dplyr) on that funcion. But unfortunately I am …
Topic: dplyr dataframe r
Category: Data Science

More efficient way to create frequency column based on different groupings

I have code below that calculates a frequency for each column element (respective to it's own column) and adds all five frequencies together in a column. The code works but is very slow and the majority of the processing time is spent on this process. Any ideas to accomplish the same goal but more efficiently? Create_Freq <- function(Word_List) { library(dplyr) Word_List$AvgFreq <- (Word_List%>% add_count(FirstLet))[,"n"] + (Word_List%>% add_count(SecLet))[,"n"] + (Word_List%>% add_count(ThirdtLet))[,"n"] + (Word_List%>% add_count(FourLet))[,"n"] + (Word_List%>% add_count(FifthLet))[,"n"] return(Word_List) } ```
Category: Data Science

R error with filter: x comparison (1) is possible only for atomic and list types

I'm working through an error that I just can't seem to troubleshoot properly. I have a dataset from Stata that I'm working with in R. I have also converted it to a csv for my own different needs in different scripts. Currently working with the csv copy, I am running into an error that I do not experience from the .dta copy - any insight would be appreciated. I am trying to filter for only rows where the binary variable …
Topic: dplyr
Category: Data Science

Is it worth switching from dplyr code to data.table in r shiny that have various reactive data?

I am sort of new in R & recently came across data.table package that is probably ~ 10 times faster than dplyr in various operations. I have a shiny app based on Covid data that is getting heavier each day & I am already not so impressed with the loading time & expecting it to get slow by each passing day. In shiny app I have provided several input options to user & hence the reactive elements and filter & …
Category: Data Science

Flag consecutive dates by group

Below is an example of my data (Room and Date). I would like to generate variables Goal1 , Goal2 and Goal3. Every time there is a gap in the Date variable means that the room was closed. My goal is to identify consecutive dates by room. Room Date Goal1 Goal2 Goal3 1 Upper A 2021-01-01 1 2021-01-01 2021-01-02 2 Upper A 2021-01-02 1 2021-01-01 2021-01-02 3 Upper A 2021-01-05 2 2021-01-05 2021-01-05 4 Upper A 2021-01-10 3 2021-01-10 2021-01-10 5 …
Topic: dplyr dataframe r
Category: Data Science

Theoretical Question: Data.table vs Data.frame with Big Data

I know that I can read in a very large csv file much faster with fread using the data.table library than with read.csv that reads a file in as a data.frame. However, dplyr can only perform operations on data.frame. My questions are: Why was dplyr built to work with the slower of the two data structures? When working with big data is it good practice to read in as data.table then convert to data.frame to perform dplyr operations? Is there …
Category: Data Science

R: Rates of change from an initial value

I have a collection of csvs and must produce yearly rates of change per group within each csv, as well as a rate of change compared to the initial value. I am using the function below to calculate yearly rates of change, and it works fine through my loop. func <- function(x, n=1) { c(rep(NA, n), diff(x, n) / head(x, -1*n)*100) } df$RateOfChange <- ave(df$Data, factor(df$Group), FUN=func) Year Group Data RateOfChange 2010 1a 3.5 NA 2011 1a 4 14.29 2012 …
Topic: dplyr dataframe r
Category: Data Science

Find the mode value and frequency in R

I'm trying to come up with a function in R that gives the mode value of a column along with the number of times (or frequency) that the value occurs. I want it to exclude missing (or blank) values, and treat ties by showing both values. When there are no repeating values I want it to return the first-appearing value that is found along with its frequency 1. "Name Color Drew Blue Drew Green Drew Red Bob Green Bob Green …
Category: Data Science

Weighted mean with summarise_at dplyr

I strictly need to use the summarise_at to compute a weighted mean, with weights based on the values of another column df %>% summarise_at(.vars = vars(FACTOR,tv:`smart tv/console`), .funs = weighted.mean, w=INVESTMENT, na.rm=TRUE) It always shows the error: 'INVESTMENT' is not found. I then tried with: df %>%summarise_at(.vars = vars(FACTOR,tv:`smart tv/console`), .funs = weighted.mean, w=vars(INVESTMENT), na.rm=TRUE) But in this case : Evaluation error: 'x' and 'w' must have the same length. Why is this? Am I doing anything wrong? Do you …
Category: Data Science

How to split data in R using dplyr if we want to have rows of the same group to belong to the same split?

In my current pipeline, I have sensed that there is data leakage. This is because the same person, though with slightly different values, is in both training and testing set. As a result, my model is overfitting. For eg my data looks like this: PID Var_1 Var_2 Person A 0 1 Person B 0 1 Person C 0 0 Person A 1 3 Person B 1 2 Person D 0 1 Person C 0 1 I want to split this …
Category: Data Science

How to create a barplot using multiple column

My data is about boardgames which contain 1 to many catgeories each like this : categorie1 categorie2 categorie3 Deduction Medieval Word Game Deduction Medieval NA Card Game Medieval Zombies Horror NA NA Horror Medieval Zombies I would like to create a barplot showing the most common categories accross games but i can't figure out how with multiple columns instead of one. Is there a dplyr method ?
Topic: dplyr ggplot2 r
Category: Data Science

Sort a data frame column based on another sorted column value in R

I have a data frame that is sorted based on one column (numeric column) to assign the rank. If this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column. But to give rank I have to consider var2 that is the reason I sorted based on var2, if there is any identical values in var2 for those rows I have to consider …
Topic: dplyr r
Category: Data Science

R: Producing multiple plots (ggplot, geom_point) from a single CSV with multiple subcategories

I have a collection of bacteria data from approximately 140 monitoring locations in California. I would like to produce a scatterplot for each monitoring location with the Sampling Date on the Y-axis and the Bacteria Data on the X-axis. The Sampling Date, Bacteria Data, and Monitoring Location all reside within their own column. I've come up with the below code: ## Create List of Files ## filenames <- list.files(path = "C:\\Users\\...") ## Combine into one CSV ## All_Data <- ldply(filenames, …
Topic: dplyr ggplot2 r
Category: Data Science

R summarise with condition

I have customer data with the products they purchased and the purchase date. I want to extract a result that shows each customer and the first two fruits they purchased. My actual set has 90000 rows with 9000 unique customers. I have tried groupby and summarise functions but I would like to be able to use summarise with condition like we use select with a where clause. Thanks for your suggestions
Topic: dplyr r
Category: Data Science

Group_by field is not showing in the summarise output in R

In R,using dplyr package, I tried the function "summarise" and I expect the result to show along with the groupby field. However, all of a sudden I see summarized output but without the groupby filed which makes the results meaningless. Any one any idea? For example if data has ages and salaries for members belonging to a team and I try to view the Average Age and Salary by Team the code below returns just the averages without the team …
Topic: dplyr r
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.