Value function when the policy is deterministic

This is the value function expression for a stochastic policy:

$\displaystyle v_{\pi}(s)=\sum_{a \in \mathcal{A}}\pi(a|s)\bigg(\mathcal{R}_s^a+\gamma \sum_{s' \in \mathcal{S}} \mathbb{P}_{ss'}^a v_{\pi}(s')\bigg) $

Question: What is the form of the value function when the policy is deterministic?

Topic markov-process reinforcement-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.