Why DQN but no Deep Sarsa?

Why is DQN frequently used while there is hardly any occurrence of Deep Sarsa? I found this paper https://arxiv.org/pdf/1702.03118.pdf using it, but nothing else which might be relevant. I assume the cause could be the Ape-X architecture which came up the year after the Deep Sarsa paper and allowed to generate an immense amount of experience for off-policy algorithms. Does it make sense or is their any other reason?

Topic q-learning reinforcement-learning

Category Data Science


Off-policy learning allows you to use experience replay, which is a finite historical bucket storing recent experiences, which you can then use to randomly sample a fraction of the events from and train your model on these events. This is done to break the autocorrelation of the events (very similar results the closer they are in time), which causes problems when training a NN. This approach cannot be used with SARSA since it uses the next action to train the model. I am sure that someone has already figured out some way to hack this together but it's not really meant to be used as such.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.