Unbalanced discounted reward in reinforcement learning : is it a problem?
Discounted rewards seems unbalanced to me.
If we take as example an episode with 4 actions, where each action receive a reward of +1 :
+1 -> +1 -> +1 -> +1
The discounted reward for the last action is : 1
The discounted reward for the first action (considering gamma = 1
for simplicity) is : 4
Intuitively both action are as good as the other, because both received same reward.
But their total reward is different, unbalanced.
So when we will backpropagate, first action will be favored over last action ?
Topic discounted-reward reinforcement-learning
Category Data Science