To me it looks like a clear case of overfitting and perhaps the main reason is that your model is far too complex for the problem. In order to differentiate beween over and under fitting you can think about learning in the following way.
The data which is given as example (training data) contains the bahavior of the true model with some additional noise. You cannot make a clear difference between them. What you do when you want to learn is actually to try to approximate the true function with a model of any sort.
In this process you come with some learning biases: you assume some possible forms for the true function (each learning method assumes some types of functions which could be approximated) and also assume some type shape of the noise (also encoded in the model structure and it's parameters).
Here you could encounter two situations. If the model is not able to build an approximation by construction (consider tryingto fit a linear model to a sinusoidal shaped true function with noise, for example), then you will not be able to learn too much from training data and also the test data will have a similar error. This is because a less flexible model is very robust (at extrema you can fit a constant model for regression, the constant model will have most of the time similar error on train or test).
If your model is too flexible, than you can approximate your true function but also more than this, aka you incorporate also many errors from the training sample. That means you approximate too wel the training data (low error on training) and you will be most of the time far from test data (errors from test data are different than training data), thus you have high errors there.
Since nobody is able to differentiate the true function from noise, neither a learning model will not be able. Very often the first questions when you start to build models for a problem are related with how complex is my problem and how flexible is my model. Since you are talking about deep nets, having a milion of parameters is a lot (despite the general indecent hype around the topic, where adding milions of additional parameters and burning a lot of money with training is a happy activity). My 2cents: start with simpler models, especially the ones which underfits, which can give you an idea about what kind of battle do you have to win.