Idenitity between TD(0) algorithm and Policy Evaluation in Dynamic Programming when alpha is equal to 1
TD(0) algorithm is defined as the iterative update of the following:
$$ V(s) \leftarrow V(s) + \alpha({r + \gamma V(s')} - V(s) ) $$
Now, if we assume alpha to be equal to 1, we get the traditional Policy Evaluation formula in Dynamic programming. Is it correct?
Topic dynamic-programming reinforcement-learning
Category Data Science