What kind of neural network would work best for loosely-defined data, like video game RAM?
I'm trying to build out a network layer map for a neural network to use in an NES AI. Most networks I run across on web searches are CNNs that use image data to identify things. Miles and miles and miles of papers, questions, and tutorials about image-based CNNs. Even the video game machine learning AIs are generally using rendered video frames as inputs to a CNN.
I played around with SethBling's MarI/O a few years ago. But that's used NEAT, a 16-year-old algorithm, and MarI/O seems more like an interesting experiment than something that could actually beat human play. It gets stuck once the level is no longer going right, and learns incredibly slowly.
My goal this time is to ditch the video feed and go straight for the NES RAM. I don't want something that can match a 4-year-old, but an AI that could potentially beat TAS speedruns or find new tricks. There's plenty of useful hidden details in the RAM, and it will contain a map of the enemy locations already. Fortunately, the NES RAM footprint is only 2048 bytes, so feeding that in as input isn't a challenge.
The real challenge is creating a NN map that would work for this data. Data like this is only very loosely-defined. While an entire map could be built for each game, that's not very practical, and could miss out on memory locations that people haven't discovered yet. (The reward/penalty side will need some hard-coded locations for score, death triggers, timers, etc.)
However, they can be broken down into a few various types:
- Integers
- Signed Integers
- Bitmaps
- Status values
I'm starting to develop a NN layer map as best as I can. But I don't have a template to lean off of, so I have no idea how well this will work:
Layer 1: 2048x10
Byte representation of RAM (ByteTensor)
Times 10 frames
+ torch.nn.RReLU
Layer 2: 2048x266x10
torch.stack(
Copy of Layer 1 ( torch.view + mm(1/255) ) # integers
Copy of Layer 1 ( convert to signed + mm(1/255) ) # signed integers
Bitmap of Layer 1 ( torch.bitand(a, $bit) for 2**0-7 ) # bitmaps
Value map of Layer 1 ( torch.eq(a, $val) for 0-255 ) # status values
)
+ torch.nn.RReLU
Layer 3: Need something to reduce the valid activators???
Layer 4:
nn.LSTM
Layer 5: 512
Fully-connected layer to outputs
A few questions:
Is this the right approach? Are there better types of networks that would work here?
With an expanded input set of 2048x266=544K inputs, what kind of layer would I use to reduce that down to only usable sets? There's a lot of wasted zeros for memory locations that may only have statuses up to 0x0A.
If I decide to add the video feed back in, how do I tie that together with a completely separate network? Does that get tied into the fully-connected layer?
Should this be much much bigger, like AlphaGo or Leela Chess Zero's humongous NN stack? Processing time isn't a factor, since it's not running in real time, but a single game shouldn't take months to process, either.
Topic game ai lstm machine-learning
Category Data Science