Output representation for a neural network to learn grid-based game with multiple units

I have a round based game played on a grid map with multiple units that I would like to control in some fashion using neural network (NN). All of the units are moved at once. Each unit can move in any of the grid map direction: $up$, $down$, $left$ and $right$.

So if we have $n$ units then output policy vector of NN should have $n^4$ entries that represents probabilities, one for each move.

Note that one move represents actions of all agents so if $n = 4$ then one move can look like this $a_i = (up, left, down, down)$.

However I am struggling with finding proper input/output representation of NN that would be permutation invariant against agent positions and from which I would be able to decode actions for particular agents.

I would appreciate any tips or references that discuss this topic.

Topic multi-output representation reinforcement-learning deep-learning

Category Data Science


What you are trying to build is a multi-agent game. The solution approach is from Multi-agent RL field and not single-agent RL. There are some differences between the frameworks that have huge impact to the process of finding an optimal solution (if you can find an optimal solution). MARL problem is quite hard and not easily scalable to many agents. I suggest you to take a look at MAPPO and their paper that you can find in the link. It is important to understand the preferable setup of the solution method: e.g. fully decentralized training and execution, fully centralized training and execution, centralized training and decentralized execution. Depending on the setup you will understand everything you need about the representation (e.g. in a centralized training setup all observations from all agents are concatenated and are inputs to the model of the value function).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.