Understanding action space in stable baselines
I was trying to write reinforcement learning agent using stable-baselines3 library. The agent(abservations)
method should return action. I went through different models API (like PPO) and they do not really allow us to specify action space. Instead action space is specified in environment.
This notebook says:
The type of action to use (discrete/continuous) will be automatically deduced from the environment action space.
So, it seems that model deduce action space from environment.
Q1. But exactly how?
Q2. Also how my agent(observations)
method should return action? By returning action returned by model.predict()
?
Topic openai-gym implementation reinforcement-learning
Category Data Science