What are the effects of clipping the reward in stability?
I am looking for stabilizing my results of DQN, I found clipping is one technique to do it but I did not understand it completely!
1- what are the effects of clipping the reward, clipping the gradient, clipping the error in stability and how makes results more stable?
2- In DQN nature it has written they clipping the reward? Would you please explain this more?
3- which of them are more effective in stability?
Topic dqn keras-rl training tensorflow deep-learning
Category Data Science