How to implement clipping the reward in DQN in keras

How to implement clipping the reward in DQN in keras? especially how to implement clipping the reward?

Is this pseudo code correct:

if reward-threshold reward=-1
elseif rewardthreshold reward=1
elseif -thresholdrewardthreshold reward=reward/threshold

And if reward is always positive how we can change clipping the reward?

Topic dqn keras-rl training tensorflow deep-learning

Category Data Science


Since you're using keras-rl, you could use its class Processor. And simply write a new processor and assign to your agent. The new processor will be something like:

class MyProcessor(Processor):
    def process_reward(self, reward):
        """Processes the reward as obtained from the environment for use in an agent and
        returns it.

        # Arguments
            reward (float): A reward as obtained by the environment

        # Returns
            Reward obtained by the environment processed
        """
        # Change min and max according to your needs. I supposed that your threshold was 1.
        min = -1
        max = 1
        return float(np.clip(reward, min, max))

Note that there isn't a real need to cast the reward as a float and in some agent implementation it could also fail (but it's wrong the agent implementation if it does). I've done it because in Gym the reward is defined to be a float. As previously said, usually nothing goes wrong if it is an int, a numpy.float64, or something else but easily castable as a float.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.