Soft actor-critic reinforcement learning for 100x100 maze environment

I am doing a project which requires a soft actor-critic reinforcement learning agent to learn how to reach a goal in a 100x100 maze environment as the one below:

  • The state space is discrete and only the agent's current position is passed as the state. For example, the state is (50, 4) in the image.
  • The action-space is also discrete and just includes [left, right, up, down, up-left, up-right, down-left, down-right].
  • The reward function is just 100 for reaching the goal.
  • My implementation and hyper-parameters closely follow the openai spinningup implementation, with tweaking for best performance.

I have tried a tonne of different things, like different discount-rates, hyper-paramter choices, network architectures, reward-shaping (distance and manhattan distance) etc, but I just can't manage to get it to learn.

I know that it is a hard problem and that the reward-shaping isn't too good because the problem is quite non-linear. Also only giving the current location as the state-space may be problematic since the agent does not know the location of walls.

Some things to note:

  1. I need to use soft actor-critic, since this is just one component in a larger system.
  2. I tried passing through the image pixels as data, but this makes it super slow to train, given my somewhat restricted resources.
  3. Any solution needs to generalise to continuous action-spaces eventually.

Does anyone have any good ideas for solving a problem like this? Any tips or advice would be so appreciated!

Topic actor-critic ai reinforcement-learning deep-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.