Unbalanced discounted reward in reinforcement learning : is it a problem?

Discounted rewards seems unbalanced to me.

If we take as example an episode with 4 actions, where each action receive a reward of +1 :

+1 -> +1 -> +1 -> +1

The discounted reward for the last action is : 1
The discounted reward for the first action (considering gamma = 1 for simplicity) is : 4

Intuitively both action are as good as the other, because both received same reward.
But their total reward is different, unbalanced.


So when we will backpropagate, first action will be favored over last action ?

Topic discounted-reward reinforcement-learning

Category Data Science


Most of the RL problems are sequential decision making processes. That is the action taken at time $t$, can influence the future rewards in a sequential way. From this perspective, the first action lead to a sequence of states/actions which produced positive rewards, which means it deserves more positive feedback than the last action. It only seems natural to prefer the first action in the first state unless we know some other action yields better cumulative rewards. Rewards are gains in short-term, an action might be good in the short term but not in the long run, which is why we use cumulative rewards called returns to estimate the value of an action in a given state.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.