Confusion about the Bellman Equation
In some resources, the belman equation is shown as below:
$v_{\pi}(s) = \sum\limits_{a}\pi(a|s)\sum\limits_{s',r}p(s',r|s,a)\big[r+\gamma v_{\pi}(s')\big] $
The thing that I confused is that, the $\pi$ and $p$ parts at the right hand side.
Since the probability part - $p(s',r|s,a)$- means that the probability of being at next state ($s'$), and since being at next state ($s'$) has to be done via following a specific action, the $p$ part also includes the probability of taking the specific actions inside it.
But then, why the $\pi(a|s)$ is written at the beginning of the equation? Why do we need it? Isn't the possibility of taking an action stated at the $p(s',r|s,a)$ part already?
Topic dynamic-programming reinforcement-learning machine-learning
Category Data Science