Neural Network regression negative performance

I have a problem with the performance of a multi layer perceptron regressor (neural network) and I cannot figure out why.

Task: I am trying to improve a time series prediction. I have predictions of a physical parameter of the last 4 years along with the quasi true values. I train the NN with the predictions for -7 days until +1 days around the day I am interested in as features, in order to obtain a better prediction for that day.

Problem: The output of the NN is worse than the feature for the day I am interested in, both for the training and the testing data. Both in terms of RMSE and MAE. I expected the output to be at least on the same level as the feature I input to the NN.

Method: Python with sklearn. I use a grid search with cross validation to get good hyper parameter. I test for different hidden layer configurations, activation functions, learning rate and regularization penalty strengths. I split the data into 66% for training and the remaining data for testing.

I am really grateful for tips how to figure out what my problem here is.

EDIT: I am using sklearn.neural_network.MLPRegressor which provides ‘identity’, ‘logistic’, ‘tanh’ and ‘relu’ as activation functions and I have teseted all of them in the grid search.

I did NOT scale the feature matrix because all features are in the same unit as the desired output and range from about -1 to +1.

EDIT2:

tuned_parameters = [{'hidden_layer_sizes': [int(2/3*number_features),
                                        (int(2/3*number_features), int(4/9*number_features)),
                                        (int(2/3*number_features), int(4/9*number_features), int(8/27*number_features))],
                 'alpha': 10.0 ** -np.arange(1, 4),
                 'activation': [identity, relu, logistic, tanh],
                 'learning_rate': ['adaptive', invscaling],
                 'solver': ['lbfgs'],
                 'early_stopping': [True],
                 'max_iter': [600]}]
                 

regr = GridSearchCV(MLPRegressor(), tuned_parameters, n_jobs=3, verbose=2)
regr.fit(feature_training_matrix, combined_training_target_vector)

Data: The prediction data I use has the following structure: for every day of the last ~4 years there were predictions made for the next 90 days. I have a text file with -90d to +90d data for every day. I try to train the NN to estimate a better prediction for the next 10 days. For this I take -7 up to +1 days around the current prediction day (1-10 days after the currently used starts to predict) as features. This means that the predcition of the day I am interested in is included as a feature.

feature example: [0.16272058, 0.13296574, 0.14213905, 0.25064893, 0.23302285,
   0.21019931, 0.20733988, 0.1466959 , 0.17029025, 0.15876942]

corresponding target: 0.174652

Topic mlp regression neural-network predictive-modeling

Category Data Science


It is quite possible that the original model does already a very good job - it is not possible to improve it. For example the true relationship between original input variables and target might be linear, so a neural network doesn't add anything.

As a test, I would increase hidden_layer_sizes and set early_stopping=False and even just doing a grid search (without CV): RMSE Training should get better than RMSE Feature. Most likely RMSE Testing will be worse, but at least you will have evidence there are no other unexpected circumstances (eg. a bug in the code).

It does make sense to use predictions of a base model as inputs. Especially if that original base model is constrained somehow, and you expect your new model to outperform the original one because in the new model you don't have that constraint. For example the original model might be linear (ARIMA or ARIMAX), while yours is nonlinear - a neural network.

Even if the true relationship between inputs and target is nonlinear, you don't have too much playroom here for a neural network (remember we want to outperform a base model). As I understand you have only ~ 4*365 observations. You can easily overtrain with many neurons, but with only a few neurons training can stuck in local minima.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.