How to handle differences between training and deploying of an RL agent

Hi I am training an RL agent for a control problem. The objective of the agent is to maintain temperature in a zone. It is an episodic task with episode length of 10 hrs and actions being taken every 15 mins. Ambient weather is one of the state variable during the training. For training process a profile of ambient temperature has been generated for each hour of the day and used for training. I have trained the agent using PPO algorithm and the agent training is converging. I wish to deploy this model for a real world case and have 2 questions regarding it.

  1. If I train an agent for taking actions for 15 mins during the training process, is it ok I make my agent take actions at every 5 min interval during deployment?
  2. If I train an agent on a particular ambient temperature profile, will the agent by heart / remember the profile used for training and expect the same temperature profile during deployment for it to work well ?

Can someone help me with these two things.


Topic actor-critic dqn ai monte-carlo reinforcement-learning

Category Data Science

  1. If I train an agent for taking actions for 15 mins during the training process, is it ok I make my agent take actions at every 5 min interval during deployment?

It is impossible to say in general. Depending on the nature of the environment and controller, this may work just fine, or may completely destroy the ability of the agent to function at all.

I would suspect that you could get away with this change, if your controller is adjusting things that manage rates of change - e.g. the power levels of heating or cooling. That is because the amount of change forced per time step (by both controller and environment) is likely to scale with the size of time step. The agent is also likely to choose to continue the same action at $t' = 1, 2, 3$ that the original agent chose at $t = 1$, because the state will not have changed as much.

Much better though would be to alter your training simulation to be more accurate in this regard. There is a trade-off here. The more faithful your simulation is to real world use, the more likely it is that you will train an agent which works well in the real world. However, creating precise and accurate simulations can be hard work. I cannot tell how much work going from 15 minute time steps to 5 minute ones would be for you - just saying the change, it doesn't seem like much effort, but perhaps it is not under your control.

  1. If I train an agent on a particular ambient temperature profile, will the agent by heart / remember the profile used for training and expect the same temperature profile during deployment for it to work well ?

Over-fitting and failure to generalise are issues with neural networks wherever they are used. Yes, this is possible.

You can control for one or both of these changes between training and production systems - time step differences and temperature profile. Check to see whether your controller has generalised enough to deploy it, by testing your agent in a simulation with characteristics closer to production. For instance, you can have the simulation work with newly generated, and previously unseen (to the agent) external temperature profiles.

You can also, at least in theory, decrease the time step size in the simulation environment, although I recommend that you make that change during training instead, unless there is some strong reason why it is not possible. It is better to fix parameters that you can, rather than expect the agent to generalise over them - especially in your case where you only have one target value.

If you are not able to adjust your simulation to use it for testing, then you have to test your trained controller in a real-world environment. At that point you will want to monitor it closely to see how well it performs. That is good practice regardless of an additional test phase in simulation. Having a test phase in simulation is simply a good idea to mitigate risk and reduce costs.


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.