How are session-parallel mini-batches used for training RNNs for session-based recommender tasks?

I am reading this paper on session-based recommenders with RNNs: https://arxiv.org/abs/1511.06939. During the training phase, the authors apply what they call session-parallel mini-batches, as depicted in the image below:

What is not clear to me is how they take items from different sessions, and feed them into the network while maintaining separate hidden states for each session. The explanation that I could come up with is maintaining as many networks as the number of parallel sessions, and use one network for one session while updating the weights of each network with the same gradient value, which would be the average gradient of the batch. However, I do not know whether this is something common in practice, or even correct.

How are the authors using session-parallel mini-batches?

Topic mini-batch-gradient-descent rnn recommender-system

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.