Jacks car rental problem: why deterministic policies?
In Sutton Barto Book: Reinforcement Learning: An Introduction, there is the following problem:
I have this question: why are the policies to be considered here are deterministic?
Topic markov-process reinforcement-learning
Category Data Science