Computing the state-value function of a Markov decision process from the classical definition
For the above Markov decision process under given action policy $a_1$, how can I determine the value of state $s_1$ using the state-value definition
$v(s)=E[G_t| S_t=s]$
where $G_t$ is the return? Assume that no discount (i.e., $\gamma=1$).
Topic markov-process reinforcement-learning
Category Data Science