Output representation for a neural network to learn grid-based game with multiple units
I have a round based game played on a grid map with multiple units that I would like to control in some fashion using neural network (NN). All of the units are moved at once. Each unit can move in any of the grid map direction: $up$, $down$, $left$ and $right$.
So if we have $n$ units then output policy vector of NN should have $n^4$ entries that represents probabilities, one for each move.
Note that one move represents actions of all agents so if $n = 4$ then one move can look like this $a_i = (up, left, down, down)$.
However I am struggling with finding proper input/output representation of NN that would be permutation invariant against agent positions and from which I would be able to decode actions for particular agents.
I would appreciate any tips or references that discuss this topic.
Topic multi-output representation reinforcement-learning deep-learning
Category Data Science