How do I choose a discount factor in Markov Decision Problems?

Question

How do I choose a discount factor in Markov Decision Problems?

Austin Capobianco

2021年8月2日 08:58

I'm referring to the gamma in the Value function:

agold · Accepted Answer · 2021年8月2日 08:58

Selecting the discount factor $\gamma$ depends on the problem. As explained by Sutton & Barto the value is always between 0 and 1: $0<=\gamma<=1.0$. If $\gamma=0$ the policy will be greedy, i.e. it will choose the best action only for the current state. And if $\gamma>0$ then (possible) future rewards will be taken into account. When $\gamma<1$ then the infinite sum is finite as long as the reward sequence is bounded.

As also commented in this related answers, with a higher $\gamma$ the policy is optimized for gains further in time, but will take more time to converge.

Niklas Riewald · Accepted Answer · 2021年7月30日 08:44

$\gamma$ is a hyperparameter of the reinforcement learning algorithm, so you can apply technics for hyperparameter optimization like grid search or bayesian optimization.

There is no general recommendation possible because it depends on the problem you want to solve. But you probably will need a value near 1 if your rewards are quite sparse

hoaphumanoid · Accepted Answer · 2016年2月5日 15:35

This is the typical value function of Reinforcement Learning. The discount factor evaluates the importance of the accumulated future events in your current value. The smaller the number, the less important are the future events in the current action.

Usually this number is selected heuristically. I usually select 0.9. If I don't want any discount then I would select 1.

How do I choose a discount factor in Markov Decision Problems?

About