Soft actor-critic reinforcement learning for 100x100 maze environment
I am doing a project which requires a soft actor-critic reinforcement learning agent to learn how to reach a goal in a 100x100 maze environment as the one below:
- The state space is discrete and only the agent's current position is passed as the state. For example, the state is (50, 4) in the image.
- The action-space is also discrete and just includes [left, right, up, down, up-left, up-right, down-left, down-right].
- The reward function is just 100 for reaching the goal.
- My implementation and hyper-parameters closely follow the openai spinningup implementation, with tweaking for best performance.
I have tried a tonne of different things, like different discount-rates, hyper-paramter choices, network architectures, reward-shaping (distance and manhattan distance) etc, but I just can't manage to get it to learn.
I know that it is a hard problem and that the reward-shaping isn't too good because the problem is quite non-linear. Also only giving the current location as the state-space may be problematic since the agent does not know the location of walls.
Some things to note:
- I need to use soft actor-critic, since this is just one component in a larger system.
- I tried passing through the image pixels as data, but this makes it super slow to train, given my somewhat restricted resources.
- Any solution needs to generalise to continuous action-spaces eventually.
Does anyone have any good ideas for solving a problem like this? Any tips or advice would be so appreciated!
Topic actor-critic ai reinforcement-learning deep-learning machine-learning
Category Data Science