openai gym - what is an agent I can use with a multi-discrete action space?

I have a custom environment with a multi-discrete action space.

The action and observation spaces are as follows:

Action:

MultiDiscrete([  3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121
   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121])

Observation:

MultiDiscrete([100   3   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121])

I am having an extremely tough time finding an agent (for example in keras-rl) that is capable of handling these spaces.

This issue: https://github.com/keras-rl/keras-rl/issues/224 indicates that the keras-rl DDPG agent is capable of handling a multi-discrete action space, but the model has a float output that I cannot use as an action for the step() function, which expects an integer output!

Most other agents seem to use a tanh activation layer, or some layer that produces a binary output. I need an output in the same shape as my action space.

How can this be handled?

Topic openai-gym deep-learning python machine-learning

Category Data Science


Suppose that right now your space is defined as follows

n_actions = (10, 20, 30)
action_space = MultiDiscrete(n_actions)

A simple solution on the environment side would be to define the space as

action_space = Discrete(np.prod(n_actions))

and then convert a discrete action to the corresponding multi-discrete action with help of np.ndindex

mapping = tuple(np.ndindex(n_actions))
multidiscrete_action = mapping[discrete_action]

OpenAI Baselines - or for me even better, Stable Baselines - has many model options which can handle MultiDicrete Action and/or Observation spaces. Building a custom gym environment is also quite straightforward.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.