Different Initial Q-Values in Q-Learning

Question

Giulia

2021年11月21日 09:02

When working with Q-Learning, what is the difference between having a Q_0(a) with all values zero, random or optimistic?

Floris den Hengst · Accepted Answer · 2021年2月17日 13:09

In the long-run, tabular Q-learning converges toward the optimal regardless of initialization.

However, the speed of convergence may be affected, similarly to an n-armed bandit setting : http://incompleteideas.net/book/first/ebook/node21.html

For more on initialization in Q learning, I recommend "Potential-based shaping and Q-value initialization are equivalent" by Eric Wiewiora.