Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam?

I am working on training RNN model on caption generation with REINFORCE algorithm. I adopt self-critic strategy (see paper Self-critical Sequence Training for Image Captioning) to reduce the variance. I initialize the model with a pre-trained RNN model (a.k.a. warm start). This pre-trained model (trained with log-likelihood objective) got 0.6 F1 score in my task.

When I use adam optimizer to train this policy gradient objective, the performance of my model drops to 0 after a few epochs. However, if I switch to gradientdescent optimizer and keep everything else the same, the performance looks reasonable and slightly better than the pre-trained model. Is there any idea why is that?

I use tensorflow to implement my model.

Topic policy-gradients rnn reinforcement-learning deep-learning nlp

Category Data Science


Without the code there's not much we can do but, I'd guess you need to significantly lower the learning rate. From my experience Adam requires a significantly lower learning rate compared to SGD.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.