Why DQN but no Deep Sarsa?

Question

Why DQN but no Deep Sarsa?

Robin

2022年5月10日 12:47

Why is DQN frequently used while there is hardly any occurrence of Deep Sarsa? I found this paper https://arxiv.org/pdf/1702.03118.pdf using it, but nothing else which might be relevant. I assume the cause could be the Ape-X architecture which came up the year after the Deep Sarsa paper and allowed to generate an immense amount of experience for off-policy algorithms. Does it make sense or is their any other reason?

Topic q-learning reinforcement-learning

Category Data Science

user2974951 · Accepted Answer · 2022年5月10日 12:47

Off-policy learning allows you to use experience replay, which is a finite historical bucket storing recent experiences, which you can then use to randomly sample a fraction of the events from and train your model on these events. This is done to break the autocorrelation of the events (very similar results the closer they are in time), which causes problems when training a NN. This approach cannot be used with SARSA since it uses the next action to train the model. I am sure that someone has already figured out some way to hack this together but it's not really meant to be used as such.

Why DQN but no Deep Sarsa?

About