r or r+1 in Temporal Difference Learning?
this is probably a very simple question for most of you but I have seen this different formulation of the TD Learning function in many different papers and can't really wrap my head around it: Just as an example: In the english wikipedia you see (https://en.wikipedia.org/wiki/Temporal_difference_learning) the value being updated based on the immediate reward whereas in the German wikipedia (https://de.wikipedia.org/wiki/Temporal_Difference_Learning) it is updated based on the reward in the upcoming trial. This is not the same equation since the rest is the same in both and the rewards don`t necessarily need to be the same in this and the upcoming trial?
Please help me out here!
Topic learning reinforcement-learning
Category Data Science