Different Initial Q-Values in Q-Learning

When working with Q-Learning, what is the difference between having a Q_0(a) with all values zero, random or optimistic?

Topic q-learning reinforcement-learning

Category Data Science


In the long-run, tabular Q-learning converges toward the optimal regardless of initialization.

However, the speed of convergence may be affected, similarly to an n-armed bandit setting : http://incompleteideas.net/book/first/ebook/node21.html

For more on initialization in Q learning, I recommend "Potential-based shaping and Q-value initialization are equivalent" by Eric Wiewiora.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.