Look for previous date in dataframe that has certain column category in R
I have the following data frame:
Date.POSIXct Date WeekDay DayCategory Hour Holidays value
1 2018-05-01 00:00:00 2018-05-01 MA MA-MI-JU 0 0 30
2 2018-05-01 01:00:00 2018-05-01 MA MA-MI-JU 1 0 80
3 2018-05-01 02:00:00 2018-05-01 MA MA-MI-JU 2 0 42
4 2018-05-01 03:00:00 2018-05-01 MA MA-MI-JU 3 0 90
5 2018-05-01 04:00:00 2018-05-01 MA MA-MI-JU 4 0 95
6 2018-05-01 05:00:00 2018-05-01 MA MA-MI-JU 5 0 5
DayCategory groups days of the week in the following way: Mondays goes to LU DayCategory. Tuesday, Wednesday and Thursdays go to MA-MI-JU DayCategory.
Friday goes to VI, Saturdays to SA and Sundays to DO Categories respectively.
I would like to find the value for the same hour in the previous day (Date) with the same DayCategory, while Holidays remains within the same group (e.g. if one instance has holiday 0 but previous day with same DayCategory has 1, we should lookv for the previous one, etc.)
As an intermediate step and to understand the process I would like to add a column PreviousDaySameDayCategory with the Date of the previous day that has the same DayCategory that the corresponding row. Some times it will be just the same date minus seven days ("LU","VI","SA","DO") but other days it will be just one day.
Reproducible data:
library(lubridate)
Date.POSIXct - seq(as.POSIXct("2018-05-01"), as.POSIXct("2018-05-31"), "hour")
mydf - as.data.frame(Date.POSIXct)
mydf$Date - as.Date(substr(as.character(mydf$Date.POSIXct),1,10))
mydf$WeekDay - substr(toupper((weekdays(mydf$Date))),1,2)
mydf$DayCategory -as.factor(ifelse(mydf$WeekDay == "MA" | mydf$WeekDay == "MI" | mydf$WeekDay == "JU",
"MA-MI-JU", mydf$WeekDay))
mydf$Hour - hour(mydf$Date.POSIXct)
mydf$Holidays - c(rep(0, 24*7),rep(1, 24*7), rep(0, 24*16+1))
set.seed(123)
mydf$myvalue - sample.int(101,size=nrow(mydf),replace=TRUE)
I have manually started the first days and craeted a vector of how the solution should look like:
a - rep(NA, 24)
b - mydf$value[1:24]
c - mydf$value[25:48]
d - rep(NA, 24)
e - rep(NA,24)
f - rep(NA,24)
g - rep(NA,24)
h - rep(NA,24)
i - mydf$value[169:192]
solution - c(a,b,c,d,e,f,g,h,i)
solution
I would appreciate any hint in the thinking process to solve this kind of problems that I face with relative frequency.
I also published this question in Stack Over Flow, I guess this kind of question fits more in this community than in StackOverflow, is this correct? I would like to delete the one that less fit the commnunity.
Topic data-table dataframe r
Category Data Science