r or r+1 in Temporal Difference Learning?

this is probably a very simple question for most of you but I have seen this different formulation of the TD Learning function in many different papers and can't really wrap my head around it: Just as an example: In the english wikipedia you see (https://en.wikipedia.org/wiki/Temporal_difference_learning) the value being updated based on the immediate reward whereas in the German wikipedia (https://de.wikipedia.org/wiki/Temporal_Difference_Learning) it is updated based on the reward in the upcoming trial. This is not the same equation since the rest is the same in both and the rewards don`t necessarily need to be the same in this and the upcoming trial?

Please help me out here!

Topic learning reinforcement-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.