Value function when the policy is deterministic

data_science_learner

2022年1月23日 11:12

This is the value function expression for a stochastic policy:

$\displaystyle v_{\pi}(s)=\sum_{a \in \mathcal{A}}\pi(a|s)\bigg(\mathcal{R}_s^a+\gamma \sum_{s' \in \mathcal{S}} \mathbb{P}_{ss'}^a v_{\pi}(s')\bigg) $

Question: What is the form of the value function when the policy is deterministic?

Topic markov-process reinforcement-learning

Category Data Science

Value function when the policy is deterministic

About