What are the differences between Reinforcement Learning (RL) and Supervised Learning?

What is the difference between Reinforcement Learning (RL) and Supervised Learning?

Does RL hava more difficulty in finding a stable solution?

Does Q-learning have more difficulty in finding a stable solution?

Does getting stuck in a local minimum happen more in supervised learning?

Is this figure correct saying that Supervised Learning is part of RL?

Topic supervised-learning markov-process reinforcement-learning

Category Data Science


Reinforcement learning is different from supervised learning, the kind of learning studied in most current research in the field of machine learning. Supervised learning is learning from a training set of labeled examples provided by a knowledgable external supervisor. Each example is a description of a situation together with a specification—the label—of the correct action the system should take to that situation, which is often to identify a category to which the situation belongs. The object of this kind of learning is for the system to extrapolate, or generalize, its responses so that it acts correctly in situations not present in the training set. This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory—where one would expect learning to be most beneficial—an agent must be able to learn from its own experience - Page 2, Reinforcement Learning, Richard S. Sutton & Andrew G. Barto


What are difference between Reinforcement Learning (RL) and Supervised Learning?

The main difference is to do with how "correct" or optimal results are learned:

  • In Supervised Learning, the learning model is presented with an input and desired output. It learns by example.

  • In Reinforcement Learning, the learning agent is presented with an environment and must guess correct output. Whilst it receives feedback on how good its guess was, it is never told the correct output (and in addition the feedback may be delayed). It learns by exploration, or trial and error.

Could we say RL has more difficulty in [finding] a stable solution?

No, because the types of problems it solves are usually different, you cannot compare like with like.

However:

  • You could use RL as a framework to produce a predictive model for a dataset (using RL to solve a Supervised Learning problem). This would be inefficient, but it would work and there is no reason to expect it to be unstable.

  • In general, RL problems present additional challenges to supervised learning problems with the same degree of complexity in the relationship between input and output. They are "harder" in that sense, as more details need to be managed, there are more hyperparameters to tune.

  • RL can become unstable in ways that don't apply in supervised learning. For instance Q learning with neural network approximation tends to diverge, and needs special care (often experience replay is enough)

Could we say that [getting stuck] in [a] local minimum is seen more in supervised learning?

No. Internally, a RL agent will often use one kind of supervised learning algorithm to predict value functions. RL does not include any special features that can avoid or work around local minima. In addition, a RL agent can get stuck in ways that don't apply to supervised learning, for instance if an agent never discovers a high reward, it will create a policy that ignores entirely the possibility of getting to that reward.

Is this figure is correct saying that Supervised Learning is part of RL?

No. The figure is at best an over-simplified view of one of the ways you could describe relationships between the Supervised Learning, Contextual Bandits and Reinforcement Learning.

The figure is broadly correct in that you could use a Contextual Bandit solver as a framework to solve a Supervised Learning problem, and a RL solver as a framework to solve the other two. However, this is not necessary, and does not capture the relationships or differences between the three types of algorithm.

Otherwise the figure contains so much over-simplification and things that I would consider as errors (although this might just be missing context that explains the diagram), that I recommend you simply forget about trying to interpret it.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.