Confusion about the Bellman Equation

In some resources, the belman equation is shown as below:

$v_{\pi}(s) = \sum\limits_{a}\pi(a|s)\sum\limits_{s',r}p(s',r|s,a)\big[r+\gamma v_{\pi}(s')\big] $

The thing that I confused is that, the $\pi$ and $p$ parts at the right hand side.

Since the probability part - $p(s',r|s,a)$- means that the probability of being at next state ($s'$), and since being at next state ($s'$) has to be done via following a specific action, the $p$ part also includes the probability of taking the specific actions inside it.

But then, why the $\pi(a|s)$ is written at the beginning of the equation? Why do we need it? Isn't the possibility of taking an action stated at the $p(s',r|s,a)$ part already?

Topic dynamic-programming reinforcement-learning machine-learning

Category Data Science


$p(s', r | s, a)$ is the probability of arriving at state $s'$ and obtain reward $r$ given that the environment was in state $s$ and the agent took action $a$. Therefore, this probability is defined assuming action $a$ is taken. There is no probability of taking $a$ included there.

The probability of the agent taking an action is provided by the policy $\pi$, and that is why we need it in the equation.

You can think of the interaction of these two terms with the law of total probability: $p(A)=\sum _{n}p(A\mid B_{n})p(B_{n})$, where $p(B_{n})$ is analogous to $\pi(a|s)$ and $p(A\mid B_{n})$ is analogous to $p(s', r | s, a)$.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.