Reinforcement learning: negative reward (punish) illegal actions?
If you train an agent using reinforcement learning (with Q-function in this case), should you give a negative reward (punish) if the agent proposes illegal actions for the presented state?
I guess over time if you only select from between the legal actions, the illegal ones would eventually drop out, but would punishing them cause them to drop out sooner and possibly cause the agent to explore more possible legal actions sooner?
To expand on this further; say you're training an autonomous vehicle, and the output is drive direction (forward or reverse) and speed. Say for the scenario you're in, the vehicle must drive between a speed range, e.g. 20mph min, 40mph max, what do you do in the scenario where the agent gives an action to drive forward but gives a speed below the minimum speed? Or another example, say you're training to play a game, and the agent proposes an illegal action which it cannot perform.
I can't proceed with the action because it's illegal, so what do I do? How do I proceed with training in that situation? I will of course enter the min/max speeds as part of the state given to the agent, but how do I prevent it from proposing actions that are illegal, and how do I proceed with training when it does?
Topic q-learning reinforcement-learning machine-learning
Category Data Science