Why MADDPG rather than taking all cooperating agents as a single meta-agent?
Since MADDPG uses a centralized critic for training, why not simply treat all cooperating agents as a single meta-agent with a concatenated observation space and a concatenated action space? In my opinion, MADDPG is centralized enough, so it won't hurt to go one step further.
Topic openai-gym reinforcement-learning
Category Data Science