Look for previous date in dataframe that has certain column category in R
I have the following data frame:
Date.POSIXct Date WeekDay DayCategory Hour Holidays value
1 2018-05-01 00:00:00 2018-05-01 MA MA-MI-JU 0 0 30
2 2018-05-01 01:00:00 2018-05-01 MA MA-MI-JU 1 0 80
3 2018-05-01 02:00:00 2018-05-01 MA MA-MI-JU 2 0 42
4 2018-05-01 03:00:00 2018-05-01 MA MA-MI-JU 3 0 90
5 2018-05-01 04:00:00 2018-05-01 MA MA-MI-JU 4 0 95
6 2018-05-01 05:00:00 2018-05-01 MA MA-MI-JU 5 0 5
DayCategory groups days of the week in the following way: Mondays goes to LU
DayCategory. Tuesday, Wednesday and Thursdays go to MA-MI-JU
DayCategory.
Friday goes to VI
, Saturdays to SA
and Sundays to DO
Categories respectively.
I would like to find the value
for the same hour
in the previous day (Date
) with the same DayCategory
, while Holidays
remains within the same group (e.g. if one instance has holiday
0 but previous day with same DayCategory
has 1, we should lookv for the previous one, etc.)
As an intermediate step and to understand the process I would like to add a column PreviousDaySameDayCategory
with the Date
of the previous day that has the same DayCategory
that the corresponding row. Some times it will be just the same date
minus seven days ("LU","VI","SA","DO") but other days it will be just one day.
Reproducible data:
library(lubridate)
Date.POSIXct - seq(as.POSIXct("2018-05-01"), as.POSIXct("2018-05-31"), "hour")
mydf - as.data.frame(Date.POSIXct)
mydf$Date - as.Date(substr(as.character(mydf$Date.POSIXct),1,10))
mydf$WeekDay - substr(toupper((weekdays(mydf$Date))),1,2)
mydf$DayCategory -as.factor(ifelse(mydf$WeekDay == "MA" | mydf$WeekDay == "MI" | mydf$WeekDay == "JU",
"MA-MI-JU", mydf$WeekDay))
mydf$Hour - hour(mydf$Date.POSIXct)
mydf$Holidays - c(rep(0, 24*7),rep(1, 24*7), rep(0, 24*16+1))
set.seed(123)
mydf$myvalue - sample.int(101,size=nrow(mydf),replace=TRUE)
I have manually started the first days and craeted a vector of how the solution should look like:
a - rep(NA, 24)
b - mydf$value[1:24]
c - mydf$value[25:48]
d - rep(NA, 24)
e - rep(NA,24)
f - rep(NA,24)
g - rep(NA,24)
h - rep(NA,24)
i - mydf$value[169:192]
solution - c(a,b,c,d,e,f,g,h,i)
solution
I would appreciate any hint in the thinking process to solve this kind of problems that I face with relative frequency.
I also published this question in Stack Over Flow, I guess this kind of question fits more in this community than in StackOverflow, is this correct? I would like to delete the one that less fit the commnunity.
Topic data-table dataframe r
Category Data Science