Accuracy and Loss in MLP

I am trying to explore models for predicting whether a team will win or lose based on features about the team and their opponent. My training data is 15k samples with 760 numerical features. Each sample represents a game between two teams and the features are long and short term statistics about each team at the time of the game (i.e. avg points over last 10 games).

My thought was to use a binary classifier as a multi layered perceptron. Each layer has batch norm, dropout, and an ReLU activation. At the output layer there is a sigmoid activation. I am also using principal component analysis to reduce the dimensionality of my dataset.

I am using an automated hyper-parameter tuner with randomized search and population based training to find optimal accuracy of the hyper-parameters. I am able to tune the hyper-parameters such that training loss and validation loss both converge to a very small value, but when I look at accuracy I am getting very minimal improvements.

When looking at the initial accuracy after one epoch of training, it seems to be around 60% across all initial hyper-parameters, and it only reaches a max of about 70% after 30 epochs.

I am not really sure what to do. Does this imply my data is to noisy/random? Should I reconsider my architecture? Would I get better results with a regression instead of classification? Also, since the data is sequential in nature this might lend itself to a recurrent architecture, but I am not familiar with any RNN that is trained on multiple sequences as opposed to a single sequence.

Topic training rnn regression neural-network classification

Category Data Science


It is hard to tell if it is the model or the signal/noise ratio in data. One sanity check could be to see how well a human would perform with the most important features.

I do not think you want to switch to a sequence model. I have a hard time seeing that it would do better for this data. Also, do not switch to regression.

Some suggestions:

  • Try a gradient boosting model, like LightGBM or XGBoost. It is really good on tabular data.
  • Do feature engineering and try to come up with features that might be good but do not exist in the data. You could aggregate to get the sequential information in there, but you already seem to be doing that to some extent.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.