Does this make data leakage in time series? # need help for understanding time series data

Does this make data leakage in time series? I already read this, data leakage when scaling time series

Data leakage is when information from outside the training dataset is used to create the model.

assume the past day is 3, predicting day is 2

Does this lead to data leakage in time series? I am not sure about this.

Considering both figures both test Y is after train / valid Y, but test X is overlapping on train / valid time series. As this sliding window can use all the datasets.

According to the definition of data leakage, the model still does not know future Y in the test. I think there is no leakage.

1st sample: 1 to 3 days input = 4 to 5 days output

2nd sample: 2 to 4 days input = 5 to 6 days output

3nd sample: 3 to 5 days input = 6 to 7 days output

4th sample: 4 to 6 days input = 7 to 8 days output

5th sample: 5 to 7 days input = 8 to 9 days output

6th sample (test): 7 to 9 days input = 10 to 11 days output

also, the validation X and Y is overlapping on the train time series. does it lead to leakage too? should I shift one more column for the validation data (like the test sample)

Note no shuffle done in here, as I know it must lead to data leakage in time series

Topic data-leakage rnn preprocessing time-series data-mining

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.