Can Reinforcement Learning learn to be deceptive?

I have seen several exampled of deploying RL agents in deceptive environnement or games and the agent learns to perform its tasks regardless. What about the other way around? Can RL be used to create deceptive agents? An example could be asking an agent a question What color is this? and it replies with a lie for example.

I am interested on a higher level of deception and not a simple if-else program that doesn't tell you what you need to know. If you know any algorithms or reading materials, please feel free to share.

Example:

Details about the agent and the environnement: The agent receives a text-based input (text-based tasks). For the sake of simplicity, let's assume there is an input control and only a certain set of tasks are allowed with certain keywords: Show me the latest news and the agent prints something from last month (it's not recent but it's a good enough answer). To simplify even more the input so it doens't turn into a 100% an NLP problem; let's say the agent knows what needs to be done when receiving the keyword show me.

Another similar use case is to have two agents. The first agent acts normally, excutes the tasks as expected but another agent punishes if it's 100% honest, meaning it will train the other agent to be deceptive.

Topic markov-process reinforcement-learning machine-learning

Category Data Science


What is a great deception? It could be defined as a believable set of information aiming to a final deceitful objective.

Just like any RL model, you can maximize a score thanks to small rewards leading to bad directions, and a great reward if the final reward is reached (ex: great loss of money).

As a consequence, you have to make sure that steps could be measured as much as possible, which is the case in NLP environment with positive or negative reactions. Then, you can have a negative reward if the user goes in the right direction, a positive reward if the user goes in the wrong one, a very negative reward if the user detect the deception and goes away, and a very positive one if the user falls in the trap.

Like most RL applications, it could require some loss (ex. saying sometimes the truth in order to gain confidence), and then reach the best score (ex. big lie).

Such behavior exists in war strategy or in finance, i.e. leading the ennemies to defeat thanks to a mix of true and false/partial information.


There is definitely a lot of work to do on the NLP and knowledgebase side of things before you can realise your agent. However, as the question suggests, we can ignore those details and focus on: Can reinforcement learning (RL) be used to train a "deceptive" agent?

The short answer is yes, this is entirely possible. In principle this is straightforward, because RL and machine learning in general is not moral unless we make it so. The learning objective of RL is to maximise reward. If an agent can maximise total reward by being deceptive, then it will do so, driven by the value function.

There are some caveats to that statement - deceptive behaviour has to be possible for the agent given the observation and action space that it works within. That doesn't mean it needs a "lie" action, but if it does not have anything as direct then the state and action space needs to be rich and complex enough such that it is possible to execute a deceptive action, and the reward system needs to be such that being deceptive is beneficial. Having a space this rich will likely also allow the agent to execute many types of nonsense and useless actions, so will be amongst the more difficult of RL challenges.

Most RL studies involving deception will also need to model the adversity involved - often this can lead to adversarial training of a second agent that attempts to detect deceptions.

Examples of RL systems that can behave deceptively:

  • Poker playing bots (Nature article). Poker is a hard AI problem because it is adversarial and much information is hidden. Successful poker playing agents must deduce information from other players' behaviour whilst avoiding giving away information about their own state through their actions. It is the second part - acting whilst witholding some data about knowledge/intentions - that is deceptive.

  • Sumo bots (OpenAI blog, see first video). If a bot can appear to commit to an action and cause its opponent to counter that action, then it may win a bout through trickery. This can occur naturally in any physical model environment. It is difficult to separate environment/simulation exploits and true "feints" that count as deceptive, but I think the linked video shows a good example.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.