Look for previous date in dataframe that has certain column category in R

I have the following data frame:

      Date.POSIXct       Date      WeekDay DayCategory Hour Holidays   value
1 2018-05-01 00:00:00 2018-05-01      MA    MA-MI-JU    0        0      30
2 2018-05-01 01:00:00 2018-05-01      MA    MA-MI-JU    1        0      80
3 2018-05-01 02:00:00 2018-05-01      MA    MA-MI-JU    2        0      42
4 2018-05-01 03:00:00 2018-05-01      MA    MA-MI-JU    3        0      90
5 2018-05-01 04:00:00 2018-05-01      MA    MA-MI-JU    4        0      95
6 2018-05-01 05:00:00 2018-05-01      MA    MA-MI-JU    5        0       5

DayCategory groups days of the week in the following way: Mondays goes to LU DayCategory. Tuesday, Wednesday and Thursdays go to MA-MI-JU DayCategory. Friday goes to VI, Saturdays to SA and Sundays to DO Categories respectively.

I would like to find the value for the same hour in the previous day (Date) with the same DayCategory, while Holidays remains within the same group (e.g. if one instance has holiday 0 but previous day with same DayCategory has 1, we should lookv for the previous one, etc.)

As an intermediate step and to understand the process I would like to add a column PreviousDaySameDayCategory with the Date of the previous day that has the same DayCategory that the corresponding row. Some times it will be just the same date minus seven days ("LU","VI","SA","DO") but other days it will be just one day.

Reproducible data:

library(lubridate)
Date.POSIXct - seq(as.POSIXct("2018-05-01"), as.POSIXct("2018-05-31"), "hour")
mydf - as.data.frame(Date.POSIXct)
mydf$Date - as.Date(substr(as.character(mydf$Date.POSIXct),1,10))
mydf$WeekDay - substr(toupper((weekdays(mydf$Date))),1,2)
mydf$DayCategory -as.factor(ifelse(mydf$WeekDay == "MA" | mydf$WeekDay == "MI" | mydf$WeekDay == "JU", 
                                    "MA-MI-JU", mydf$WeekDay))
mydf$Hour - hour(mydf$Date.POSIXct)
mydf$Holidays - c(rep(0, 24*7),rep(1, 24*7), rep(0, 24*16+1))
set.seed(123)
mydf$myvalue - sample.int(101,size=nrow(mydf),replace=TRUE)

I have manually started the first days and craeted a vector of how the solution should look like:

  a - rep(NA, 24)
  b - mydf$value[1:24]
  c - mydf$value[25:48]
  d - rep(NA, 24)
  e - rep(NA,24)
  f - rep(NA,24)
  g - rep(NA,24)
  h - rep(NA,24)
  i - mydf$value[169:192]
  solution - c(a,b,c,d,e,f,g,h,i)
  solution

I would appreciate any hint in the thinking process to solve this kind of problems that I face with relative frequency.

I also published this question in Stack Over Flow, I guess this kind of question fits more in this community than in StackOverflow, is this correct? I would like to delete the one that less fit the commnunity.

Topic data-table dataframe r

Category Data Science


If I understand you correctly, you want to retrieve the date of the previous same day category, same holiday class, of the same hour.

If it is the case, perhaps you can try:

1) Segment rows within the same hour, day category, and holiday group into the same cluster (or filter, etc.)

2) For each cluster, Rank the Date (first day in the group is 0, second = 1, ... nth day = n - 1, etc.) from the lowest, etc.

3) For each Rank r, assign the PreviousDaySameDayCategory by looking up the Date for row r - 1 for r - 1 >= 0 within the same cluster

It is rather straightforward to achieve with dplyr. Please let me know if the following makes sense to you.

(For some reasons, I am not able to run your solution)

library(dplyr)
rankedDf <- mydf %>%
  group_by(DayCategory, Hour, Holidays) %>%
  arrange(DayCategory, Hour, Holidays, Date) %>%
  mutate(rowRank = order(Date), previousRowRank = order(Date) - 1) %>%
  left_join(., ., by = c("previousRowRank" = "rowRank", "DayCategory", "Hour", "Holidays")) %>%
  select(
    Date.POSIXct = Date.POSIXct.x,
    Date = Date.x,
    WeekDay = WeekDay.x,
    DayCategory,
    Hour,
    Holidays,
    myvalue = myvalue.x,
    PreviousDaySameDayCategory = Date.y,
    PreviousValueSameDayCategory = myvalue.y
  )
print.data.frame(rankedDf, nrow = 500)

Note: I also include DayCategory, etc. in the arrange so that it's easier to understand how it works; the code works just fine with only Date as the parameter.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.