Value function when the policy is deterministic
This is the value function expression for a stochastic policy:
$\displaystyle v_{\pi}(s)=\sum_{a \in \mathcal{A}}\pi(a|s)\bigg(\mathcal{R}_s^a+\gamma \sum_{s' \in \mathcal{S}} \mathbb{P}_{ss'}^a v_{\pi}(s')\bigg) $
Question: What is the form of the value function when the policy is deterministic?
Topic markov-process reinforcement-learning
Category Data Science