Different results every time I train a reinforcement learning agent

Question

Different results every time I train a reinforcement learning agent

cvg

2019年11月6日 12:06

I am training an RL agent for a control problem using PPO algorithm. I am using stable-baselines library for it.

The objective of an agent is to maintain a temperature of 24 deg in a zone and it takes actions every 15 mins.The length of episode is 9 hrs. I have trained the model for 1 million steps and the rewards have converged. I assume that the agent is trained enough. I have done some experiments and have few questions regarding the training

I test an agent by letting it take actions from a fixed initial state, and monitor the actions taken by actions and states for an episode. When I test the agent multiple times, actions taken and states resulted are different every time. Why is this happening when the agent is trained enough?
I train an agent for 1 million steps. I train another agent for 1 million steps on the same environment with same step of hyperparameters and every thing else same. Both these agents converge. Now when I test these agents actions taken by these agents are not identical/similar. Why is this so?

Can someone help me with these.?

Thank you

Topic actor-critic dqn monte-carlo reinforcement-learning deep-learning

Category Data Science

user1939088 · Accepted Answer · 2019年11月6日 12:06

A part of the agent consists of taking random actions. So there is a % chance that the agent will take a random action instead of an action based on the training. This is called "exploration". This page describes this as "The amount of randomness in action selection depends on both initial conditions and the training procedure. Over the course of training, the policy typically becomes progressively less random, as the update rule encourages it to exploit rewards that it has already found. "
This is normal. The agent's network is initialized with random weights, and part of the actions it takes during the training are also random (see above). So different training runs will produce different results. If you want to circumvent this issue, you could use a fixed seed for the random-number generator.

Different results every time I train a reinforcement learning agent

About