How to impute the missing values in time series for long periods

I have electrical consumption data between 2016-2019. The data was recorded every 30 minutes for 4 years. There is no data between 13/03/2019 - 31/03/209. I started with pandas.DataFrame.interpolate and I almost tried all methods without any fix for this problem. You can see below some of the results.

  • df.interpolate(method=nearest)

  • df.interpolate(method=akima)

  • df.interpolate(method=time)

Now, I am thinking to use the same data of the last year March 2018 to fill the missing values in March 2019.

  • Do you think it is the best method to handle this problem? If not, do you have other suggestions? I am asking if there are some packages to handle this problem.

Topic interpolation time-series

Category Data Science


There are many counter-examples in using the temporal data a year before inferring the temporal missing values a year after.

I suggest you take a look at the Darts package which is tailored for time series.

As a suggestion, say that you have to infer $m$ missing values, you can proceed as follows. Suppose that you have trained a forecasting model $f(\cdot)$ that forecast the $(n+1)$-th value, say $\hat{v}$, from a generic sequence of $n$ values, say $\langle v_1,v_2,\ldots,v_n \rangle$, that is: $$ \hat{v} = f(\langle v_1,v_2,\ldots,v_n \rangle). $$

To predict the first missing value, say $\hat{v}_1$, out of $m$, call:

$$ \hat{v}_1 = f(\langle v_1,v_2,\ldots,v_n \rangle) $$

where the sequence $\langle v_1,v_2,\ldots,v_n \rangle$ represents the last $n$ values that are known before the first missing value. Now, recursively, having the predicted sequence $\langle \hat{v}_1, \ldots, \hat{v}_{i-1} \rangle$, one can predict the $i$-th value out of $m$, for $1 < i \le m$, by calling:

$$ \hat{v}_i = f(\langle v_i,v_{i+1},\ldots,v_n,\hat{v}_1,\hat{v}_2,\ldots,\hat{v}_{i-1} \rangle). $$

There are pros and cons to this approach. An advantage is that we do not need to exploit any data model for the missing values and to infer using such a model. A disadvantage is that as we incrementally infer each missing value the error will increase since we use predictions (i.e., inferred missing values) to predict the next outcomes; that is, as $m$ increases the error increases.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.