Python: SARIMAX Model Fits too slow

I have a time series data with the date and temperature records of a city. Following are my observations from the time series analysis:

  1. By plotting the graph of date vs temperature seasonality is observed.
  2. Performing adfuller test we find that the data is already stationary, so d=0.
  3. Perform Partial Autocorrelation and Autocorrelation with First Seasonal Difference and found p=2 and q=10 respectively.

Code to Train Model

model=sm.tsa.statespace.SARIMAX(df['temperature'],order=(1, 1, 1),seasonal_order=(2,0,10,12))
results=model.fit()

This fit function runs indefinitely and does not reach an output. I am running on a on Google Colab CPU.

How to fix this issue?

Topic colab python-3.x arima time-series python

Category Data Science


Assuming you have multiple cities in the dataframe. you can create some new features in the dataframe . For example , I created a few features below to try and match you PACF and ACF graphs .

df['lag_1'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(1))

d=1

df['d_1'] = df['temperature'] - df['lag_1']

p = 1:

df['p_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1))

q = 1:

df['ma_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1).rolling(1).mean())

P=2 (and other terms)

df['lag_t12'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(12))

df['lag_t24'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(24))

.

.

.

df['lag_t120'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(120))

Q=10 :

df['Q_10'] = df[col for col in df if col.startswith('lag_t')].mean()

After this try using LightGBM , XGBoost or other regression packages to regress against these newly created features with temprature as your target variable.

Alternatively , you can forego the ACF/PCF approach altogether and instead create bunch of commonly used features using :

  • shift
  • rolling mean
  • rolling standard deviations
  • max() , min() within groups

and regress against those and check which features minimize RMSE/AIC/BIC in your Regression Hyperparameters.

Since Cross validation is different in cases of Time Series,consider using TimeSeriesSplit in scikit-learn . Check-out this post in case you want to do grouped time series cross validation .

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.