Which ML approach to choose for the game AI when rewards are delayed?

Question

Which ML approach to choose for the game AI when rewards are delayed?

Logende

2020年6月13日 01:20

Question: Which Machine Learning approach should I choose for the AI of my computer game, where the actions of the AI do not lead to immediate rewards, but delayed rewards instead?

About me: I am a complete beginner in the area of machine learning. This is my first own machine learning project. I have been part of other projects that included machine learning before, but I have never done everything from scratch and completely by myself.

About the game: The game is similar to "Age of wars": It is 2d and two players play against each other. Both own a castle (one on the left side, the other on the right side of the playground) and can build units. Those units fight against each other and try to reach the castle of the opponent and to destroy that castle. The player, who destroys the castle of the opponent, wins.

Features and machine learning task: The task the AI should solve is classification. Based on the feature vector input, the AI should choose between the following labels:

Nothing
Unit 1 (Boss)
Unit 2 (Tank)
Unit 3 (Melee)
Unit 4 (Ranger)
Unit 5 (Mage)

The feature vector currently contains around 60 features, including:

Available money
The distance of the furthest unit to the enemy castle
Sum of unit health
Count of units
Count of units of each type as separate features
Cost of each unit type
Castle health
Recent damage taken / Recent damage dealt
Current effectivity of each unit type against the current enemies (some units are more effective/less effective against other units, based on their armor and attack class)

Most features are calculated for both the AI player and the opponent of the AI player.

Game-specific details: It is possible to estimate a score live during a match, however, I think, it is, basically, impossible to determine whether one specific action was good or bad. When a unit is purchased by a player it takes some time until that unit reaches the enemy and attacks the enemy, possibly damaging enemies or getting damaged. One big advantage: Due to recent architectural adaptions, one match can be simulated around 100 times faster than a real-time game. Therefore, a complete match (which takes around 5-15 minutes) can be simulated in less than 10 seconds. Multiple matches can be simulated in parallel. Therefore, on my computer, almost 60 matches per minute can be simulated. I also prepared the game for machine learning: A standalone executable of the game can be called from external software (e.g. Python tooling) and takes a configuration file as input. It simulates the matches given by the configuration file and then produces a result output file, listing the results of the simulated matches, as well as a lot of details about the matches (e.g. match duration, a detailed list of game events, such as EntitySpawn, EntityDamage, EntityDeath, etc.). That output file could be accessed by the external tooling to evaluate the success of the current ML model.

My current considerations: For this classification task, I think a decision tree model, like a Random Forest, probably makes a lot of sense. My problem is mainly, that I do not know how to train and how to connect the machine learning tooling part with the game part. As the data is not really labeled, I guess, reinforcement learning might be a good choice here. I have already read some articles regarding Q-Learning. Now, there are two difficulties:

The "delayed reward" topic: Usually, directly when a decision is made by the ML model, a score can be given, telling whether the decision was good/bad compared to other possible decisions. This is not possible here, because it takes some time, until it is known, whether a decision is good or bad. Additionally, even after some time, it is not known which decision contributed to the current state in a positive way and which did not.
An architectural difficulty: At least with the current architecture, it is not really possible for the training process to consider single classification processes, as the game is decoupled from the machine learning code. Only the results of a completed match can be accessed by the machine learning code. At least right now, the ML code does not have access to the game during a running match.

Although those limitations exist (the second one could be adapted if really necessary), I think there should already be many possibilities to successfully train a model. As thousands of complete matches can be simulated within just a few hours, it should (I think) be possible to do something like the following:

Start with random model parameters
Simulate a few matches and calculate result score.
Adapt some model parameters
Repeat step 2 and 3 again and again, with the goal of increasing/decreasing the score. This sounds just like hyperparameter tuning, which would already be possible IF there was an appropriate model with parameters that can be adjusted.

Note that the simulation accepts an ML model as input for the player AI. This makes it technologically possible to automatically try out different models and to compare those.

This is all I can come up with right now, with my limited knowledge in the area of machine learning.

If you have any ideas or know about approaches that you think might work well, please let me know!

Additional note: The goal here is to create a strong AI. Game design topics, such as "The AI should be fun to play against" should not be considered.

Thank you very much.

Topic game decision-trees reinforcement-learning random-forest machine-learning

Category Data Science

Climbs_lika_Spyder · Accepted Answer · 2020年6月13日 01:20

I would use temporal difference learning from reinforcement learning. Temporal difference learning employs TD propagation rather than backpropagation. The difference being that TD takes into account the time delay aspect. In fact, it is likely in this scenario that combining the two propagation methods would be optimal.