What kind of neural network would work best for loosely-defined data, like video game RAM?

I'm trying to build out a network layer map for a neural network to use in an NES AI. Most networks I run across on web searches are CNNs that use image data to identify things. Miles and miles and miles of papers, questions, and tutorials about image-based CNNs. Even the video game machine learning AIs are generally using rendered video frames as inputs to a CNN.

I played around with SethBling's MarI/O a few years ago. But that's used NEAT, a 16-year-old algorithm, and MarI/O seems more like an interesting experiment than something that could actually beat human play. It gets stuck once the level is no longer going right, and learns incredibly slowly.

My goal this time is to ditch the video feed and go straight for the NES RAM. I don't want something that can match a 4-year-old, but an AI that could potentially beat TAS speedruns or find new tricks. There's plenty of useful hidden details in the RAM, and it will contain a map of the enemy locations already. Fortunately, the NES RAM footprint is only 2048 bytes, so feeding that in as input isn't a challenge.

The real challenge is creating a NN map that would work for this data. Data like this is only very loosely-defined. While an entire map could be built for each game, that's not very practical, and could miss out on memory locations that people haven't discovered yet. (The reward/penalty side will need some hard-coded locations for score, death triggers, timers, etc.)

However, they can be broken down into a few various types:

  • Integers
  • Signed Integers
  • Bitmaps
  • Status values

I'm starting to develop a NN layer map as best as I can. But I don't have a template to lean off of, so I have no idea how well this will work:

Layer 1: 2048x10
    Byte representation of RAM (ByteTensor)
        Times 10 frames

    + torch.nn.RReLU

Layer 2: 2048x266x10
    torch.stack(
        Copy      of Layer 1 ( torch.view + mm(1/255) )           # integers
        Copy      of Layer 1 ( convert to signed + mm(1/255) )    # signed integers
        Bitmap    of Layer 1 ( torch.bitand(a, $bit) for 2**0-7 ) # bitmaps
    Value map of Layer 1 ( torch.eq(a, $val) for 0-255 )      # status values
    )

    + torch.nn.RReLU

Layer 3: Need something to reduce the valid activators???

Layer 4:
    nn.LSTM

Layer 5: 512
    Fully-connected layer to outputs

A few questions:

  1. Is this the right approach? Are there better types of networks that would work here?

  2. With an expanded input set of 2048x266=544K inputs, what kind of layer would I use to reduce that down to only usable sets? There's a lot of wasted zeros for memory locations that may only have statuses up to 0x0A.

  3. If I decide to add the video feed back in, how do I tie that together with a completely separate network? Does that get tied into the fully-connected layer?

  4. Should this be much much bigger, like AlphaGo or Leela Chess Zero's humongous NN stack? Processing time isn't a factor, since it's not running in real time, but a single game shouldn't take months to process, either.

Topic game ai lstm machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.