Guidelines to debug REINFORCE-type algorithms?

Question

Guidelines to debug REINFORCE-type algorithms?

Astariul

2022年5月8日 20:00

I implemented a self-critical policy gradient (as described here), for text summarization.

However, after training, the results are not as high as expected (actually lower than without RL...).

I'm looking for general guidelines on how to debug RL-based algorithms.

I tried :

Overfitting on small datasets (~6 samples) : I could increase the average reward , but it does not converge. Sometimes the average reward would go down again.
Changing the learning rate : I changed the learning rate and see its effect on small dataset. From my experiment I choose quite big learning rate (0.02 vs 1e-4 in the paper)
Looking at how average reward evolve as training (on full dataset) goes : Average reward significantly does not move...

Topic policy-gradients pytorch reinforcement-learning nlp

Category Data Science

Astariul · Accepted Answer · 2019年8月8日 04:41

The only resource I could find so far :

https://github.com/williamFalcon/DeepRLHacks

For my specific case, I made a few errors :

Frozen some part of the network that shouldn't be frozen
Wrong learning rate

Even if I could overfit a small dataset, it didn't mean anything : while training on the whole dataset, the average reward was not going up.

You should look for a reward going up.

I'm not accepting this answer as I believe it is not complete : it lacks general and systematic guidelines to debug a Reinforcement Learning algorithm.

Guidelines to debug REINFORCE-type algorithms?

About