Does an RL agent learn during exploitation?

I have started with RL and have some doubts regarding it.

  1. Does an RL agent learn during exploitation, or does it only learn during exploration?

  2. Is it possible to train a model only using exploitation (i.e. where exploration is not allowed)?

Topic ai reinforcement-learning machine-learning

Category Data Science


It depends on how you define learning. Usually learning in ML means to adapt some parameters of a model. In this case the agent does learn during exploitation. It will drive the probability mass to 1 for the currently best action, unless otherwise regularized.


It depends on the game the agent is playing. If there are rewards all over the environment, the agent learns only when the coefficient of exploration is greater than zero. That is, if you are only allowing it to exploit, the agent may aswell just exploit the first reward it meets which ends a game.

In this case, it will find the first reward that ends the game, and will not change its algorithm (will not learn any other way of playing). On the other hand, if you allow it to explore, it may eventually find another better strategy (learn).

It is always good to conciliate the ratio of exploration and exploitation. It should always be capable of exploring, even if the coefficient is low. That is the whole advantage of Reinforcement Learning.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.