R: Rates of change from an initial value

I have a collection of csvs and must produce yearly rates of change per group within each csv, as well as a rate of change compared to the initial value.

I am using the function below to calculate yearly rates of change, and it works fine through my loop.

func - function(x, n=1) {
  c(rep(NA, n), diff(x, n) / head(x, -1*n)*100)
}
 df$RateOfChange - ave(df$Data, factor(df$Group), FUN=func)


Year    Group   Data    RateOfChange
2010    1a      3.5     NA
2011    1a      4       14.29
2012    1a      3.8     -5.00
2013    1a      4.1     7.89
2014    1a      4       -2.44
2010    1b      2       NA
2011    1b      2.3     15.00
2012    1b      2.4     4.35
2013    1b      3       25.00
2014    1b      1.8     -40.00
2009    2       10      NA
2010    2       12      20.00
2011    2       13      8.33
2012    2       15      15.38
2013    2       14      -6.67
2014    2       11      -21.43

However, I also would like to calculate the percentage change in each year compared to the initial value, like so:

Year    Group   Data    BaselineChange
2010    1a      3.5     NA
2011    1a      4       14.29
2012    1a      3.8     8.57
2013    1a      4.1     17.14
2014    1a      4       14.29
2010    1b      2       NA
2011    1b      2.3     15.00
2012    1b      2.4     20.00
2013    1b      3       50.00
2014    1b      1.8     -10.00
2009    2       10      NA
2010    2       12      20.00
2011    2       13      30.00
2012    2       15      50.00
2013    2       14      40.00
2014    2       11      10.00

Surely there is a way to do this. It'd be easy enough in Excel for a single csv, but I have several hundred, each with multiple groups. I have tried dplyr::diff and dplyr::lag, but I can't get the pipeline to abide by the group variables in the way that other questions posted on this site do.

Topic dplyr dataframe r

Category Data Science


You can reproduce both of your desired outputs using dplyr. I don't know how you were trying to use lag, but one solution is to have mutate create dummy columns that you will remove at the end.

df %>%                                            # Set up the pipe
arrange(Group, Year) %>%                         # Ensures the data are in ascending order 
group_by(Group) %>%                              # Group by 'Group'
mutate(lag = lag(Data),                          # Create a dummy column 'lag'
     RateOfChange = (Data - lag) * 100 / lag,    # Calculate RateOfChange
     First = head(Data, 1),                      # Dummy column for the frist yearly value by group
     BaselineChange =                            # Calculate BaselineChange     
       case_when(Data != First ~ (Data - First) * 100 / First,
                 TRUE ~ 1 * NA)) %>%             # This line provides NA value for first year
select(Year, Group, Data,                        # Select only columns you want
       RateOfChange, BaselineChange) %>%
head(5)                                          # Print the first five rows to validate

Here is the output:

# A tibble: 5 x 5
# Groups:   Group [1]
  Year Group  Data RateOfChange BaselineChange
  <int> <fct> <dbl>        <dbl>          <dbl>
1  2010 1a      3.5        NA             NA   
2  2011 1a      4          14.3           14.3 
3  2012 1a      3.8        -5.             8.57
4  2013 1a      4.1         7.89          17.1 
5  2014 1a      4          -2.44          14.3

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.