Minibatches when training on two datasets of different size
Suppose I have two datasets, $X$ and $Y$, of different sizes. I am training two networks together, one which takes inputs $x\in X$, and the other takes inputs $y\in Y$. The two networks share parameters and therefore are trained together.
Are there some guidelines on how to chose the batch-sizes for the samples from $X$ vs. those from $Y$? That is, should the the batches from $X$ have the same size as the batches from $Y$?
In general, the two networks can be very different in number of parameters, and the number of total number of training data points available in $X$ can be very different from the number of points in $Y$.
Topic mini-batch-gradient-descent
Category Data Science