GRU learns small-scale features, but misses large scales
Playing around with weather data, I have set up a simple RNN with one layer of GRUs. It is trained to recover the temperature of the next day, given weather data of the last 5 days, each with 1-hour intervals.
What I find peculiar is that after training several epochs, the result is something that has a lot of the small scale features of the data, but it lacks the large-scale structure. Frequently, there seems to be just an offset between prediction and data, e.g. at x=900 below. On the other hand, the very steep peaks are also not well fitted.
Here is the code of the model:
model = keras.Sequential()
model.add(keras.layers.GRU(units=120,
activation='relu',
dropout=0.1,
recurrent_dropout=0.4,
return_sequences=False,
input_shape=(120, 14)))
model.add(keras.layers.Dense(1))
model.compile(optimizer=keras.optimizers.RMSprop(),loss='mse')
The training set has 120 (=5 days) of weather data, with 15 variables at each point in time. For example, these are the first 2 of 120 vectors in the first training set:
print (training_x[0,0:2,:])
[[ 0.87422976 -2.0740129 -2.12744145 -2.05861548 1.04950092 -1.32397418
-1.53525603 -0.78058659 -1.53697269 -1.53946235 2.29360559 -0.01027133
-0.01893096 -0.25892163 -1.72366227]
[ 0.87183698 -2.02652589 -2.07923146 -1.96947126 1.14054157 -1.30976048
-1.49940654 -0.78875511 -1.49932401 -1.50404649 2.24182343 -0.02325901
-0.03515888 -0.09510368 -1.72366227]]
They were normalized beforehand.
(I should note it is based on this)
I am trying to figure out whether this is a well-known phenomenon and/or in which direction I should change my model in order to compensate. I am using 120 GRU 'units' followed by a single unit dense layer.
Category Data Science