RCNN to predict sequence of images (video frames)?
In the following work the authors apply a convolutional recurrent neural network (RNN) to predict the spatiotemporal evolution of microstructure represented by 2D image sequences. In particular, they apply some sort of 3D-CNN LSTM architecture to predict crystal growth:
In the above picture, we can see RNN predictions (P) versus ground truth (G) from a testing case, in which the RNN outputs 50 frames based on 10 input frames.
Now, this is something new to me: how is possible for a RCNN to generate images as output? From my (limited) knowledge, the only structures able to generate new images in output are generative adversarial networks like GANs and Convolutional Encoder-Decoder NN (like VAE), but apparently the authors achieve these results by solely stacking together 3D-Convs and RNN units.
Have you ever seen these kind of architectures?
Topic recurrent-neural-network generative-models gan cnn
Category Data Science