Training a model that has both 2D and 1D features using a CNN
I'm looking to pre-train a model for an RL agent but I'm having some trouble figuring some stuff out.
Dataset: Minerl MineRLNavigateDense-v0
The observation space includes :
2D screen input (64,64) (+ 3 channels of color) 1D (scalar) compass angle 1D (scalar) number of dirt blocks + this is all over time.
I am also given the reward based on the action the human took.
When training a model using a CNN for time series classification my understanding is that each feature (out of k total features) at a given point in time can be represented as a scalar value (visualized as each box in each row) and then the "time" component can be visualized by each row being a new timestep:
My question is: How do you set up the training data in such a way where this sort of CNN can be used to train on, particularly because each datapoint has a different dimension, do we just flatten the screen input into 64*64*3=12288 new features? Because that feels wrong.
UPDATE:
My first problem, combining the 2D image and the scalar data points was answered on Discord group I am in. In Keras at least, there's these special layer types called "merge layers" https://keras.io/layers/merge/ A CNN can be applied to the image first, then merged with the scalar data, this could then be passed into a 1D CNN to add the temporal component, however I haven't actually done any of this yet. :)
Topic cnn reinforcement-learning neural-network
Category Data Science