openai gym - what is an agent I can use with a multi-discrete action space?

Question

openai gym - what is an agent I can use with a multi-discrete action space?

Daniel Paczuski Bak

2021年4月20日 13:12

I have a custom environment with a multi-discrete action space.

The action and observation spaces are as follows:

Action:

MultiDiscrete([  3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121
   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121])

Observation:

MultiDiscrete([100   3   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121])

I am having an extremely tough time finding an agent (for example in keras-rl) that is capable of handling these spaces.

This issue: https://github.com/keras-rl/keras-rl/issues/224 indicates that the keras-rl DDPG agent is capable of handling a multi-discrete action space, but the model has a float output that I cannot use as an action for the step() function, which expects an integer output!

Most other agents seem to use a tanh activation layer, or some layer that produces a binary output. I need an output in the same shape as my action space.

How can this be handled?

Topic openai-gym deep-learning python machine-learning

Category Data Science

Kajetan Janiak · Accepted Answer · 2021年4月20日 13:12

Suppose that right now your space is defined as follows

n_actions = (10, 20, 30)
action_space = MultiDiscrete(n_actions)

A simple solution on the environment side would be to define the space as

action_space = Discrete(np.prod(n_actions))

and then convert a discrete action to the corresponding multi-discrete action with help of np.ndindex

mapping = tuple(np.ndindex(n_actions))
multidiscrete_action = mapping[discrete_action]

Eric Beckwith · Accepted Answer · 2019年8月27日 07:19

1

Eric Beckwith answered at 2019年8月27日 07:19

OpenAI Baselines - or for me even better, Stable Baselines - has many model options which can handle MultiDicrete Action and/or Observation spaces. Building a custom gym environment is also quite straightforward.

openai gym - what is an agent I can use with a multi-discrete action space?

About