Compute for policy the state value function v(s) for each state
The instruction of the question: State A is absorbing. Transition to A from state 1 or 4 yields an immediate reward of 12. All other transitions incur a reward of 1. Transitions are deterministic (i.e. each action maps a state s to a unique successor state s0). For the remainder of this question, we will assume = 1. On this MDP, consider a policy that assigns transition probabilities as indicated in the gure below. E.g.: (move to Aj currently in state 1) = 1=2 and (move to 1 j currently in state 2) = 1=2, etc.
How is it possible to calculate the state value function for each state here? I know how to calculate the optimal state value for each state by starting at state (1,4), their V* is 12. Starting from these I can easily calculate the optimal values of 2 and 3 because I already have 1 and 4. However, I do not know how to calculate the state value function because I do not understand from what value to start. I hope that someone can help me here.
Topic markov-process reinforcement-learning
Category Data Science