TF Agents DdqnAgent for Continuous Tasks (Non-Episodic Environments)
I would like to use TF Agents in Non-Episodic environments (continuous tasks without a termination state). In such implementations, the agent can continue learning without the need to reset the environment at the end of an episode, where it usually calculates the return of the episode.
I have found similar questions without answers here and there. This explanation seems to be convincing using the concept of average rewards. However, I would like to know whether TF Agents already provide such functionality. I am asking because modifying the td_loss function in the DdqnAgent class is not straightforward as it has to adhere to the tensor graph.
Thank you very much in advance!
Topic dqn tensorflow reinforcement-learning
Category Data Science