Different Initial Q-Values in Q-Learning
When working with Q-Learning, what is the difference between having a Q_0(a) with all values zero, random or optimistic?
Topic q-learning reinforcement-learning
Category Data Science
When working with Q-Learning, what is the difference between having a Q_0(a) with all values zero, random or optimistic?
Topic q-learning reinforcement-learning
Category Data Science
In the long-run, tabular Q-learning converges toward the optimal regardless of initialization.
However, the speed of convergence may be affected, similarly to an n-armed bandit setting : http://incompleteideas.net/book/first/ebook/node21.html
For more on initialization in Q learning, I recommend "Potential-based shaping and Q-value initialization are equivalent" by Eric Wiewiora.
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.