How can I tune LSTM hyperparameters?

If anyone is there to answer these, that'll be great. I'm in the midst of a Final Year Project on LSTM.

Currently, I’m stuck and confused over LSTM codes. There are 4 hyperparameters that I can play around with:

  • Look back
  • Batch size
  • LSTM units
  • No. of Epochs

Can you explain what will happen to my results if I tune each of these hyperparameters? And also is it common if we get different results each time we run the codes?

Topic epochs hyperparameter-tuning lstm hyperparameter

Category Data Science


  • Lookback: I am not sure what you refer to. First thing that comes to mind is clip which is a hyperparameter controlling for vanishing/exploding gradients.
  • Mini batching: There is a tradeoff between computational speed and speed of convergence. In short, *it has been observed that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.*Hence, you need to search over the ideal size for your case. See an excellent discussion here
  • LSTM units: otherwise called latent dimension of each LSTM cell, it controls the size of your hidden and cell states. The larger the value of this the "bigger" the memory of your model in terms of sequential dependencies. This will be softly depended to the size of your embedding.
  • Epochs: if you are not familiar with the notion of stochastic, mini-batching and batch training I suggest you familiarise your self with it before moving any further. here or here. In essence the number of epochs will define the number of times your model will see the entirety of your dataset.

Look back, I don't know look back as an hyper parameter, but in LSTM when you trying to predict the next step you need to arrange your data by "looking back" certain time steps to prepare the data set for training, for example, suppose you want to estimate the next value of an episode that happens every time t. You need to re-arrange you data in a shape like: {t1, t2, t3} -> t4 {t2, t3, t4] -> t5 {t3, t4, t5} -> t6 The net will learn this and will be able to predict tx based on previous time steps.

Batch size (is not referred to LSTM only), roughly is how much samples will be trained per single step, as bigger the batch size is the faster the training is but more memory is needed. In a GPU is better to have bigger batch sizes because copying the values from GPU to memory is slow.

LSTM units, refers to how much "smart" neurons you will have. This is highly dependent on your dataset, usually you determine this depending on your vector dimensions.

No. of Epochs, how much times the algorithm will run to approximate the observations. Usually to much epochs will overfit your model and to little will end up in an under fitted one.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.