Training an ensemble of small neural networks efficiently in TensorFlow 2
I have a bunch of small neural networks (say, 5 to 50 feed-forward neural networks with only two hidden layers with 10-100 neurons each), which differ only in the weight initialization. I want to train them all on the same, smallish dataset (say, 10K rows), with a batch size of 1. The aim of this is to combine them into an ensemble by averaging the results.
Now, of course I can build the whole ensemble as one neural network in TensorFlow/Keras, like this:
def bagging_ensemble(inputs: int, width: int, weak_learners: int):
r'''Return a generic dense network model
inputs: number of columns (features) in the input data set
width: number of neurons in the hidden layer of each weak learner
weak_learners: number of weak learners in the ensemble
'''
assert width = 1, 'width is required to be at least 1'
assert weak_learners = 1, 'weak_learners is required to be at least 1'
activation = tf.keras.activations.tanh
kernel_initializer = tf.initializers.GlorotUniform()
input_layer = tf.keras.Input(shape=(inputs,))
layers = input_layer
hidden = tf.keras.layers.Dense(units=width, activation=activation, kernel_initializer=kernel_initializer)\
(input_layer)
hidden = []
# add hidden layer as a list of weak learners
for i in range(weak_learners):
weak_learner = tf.keras.layers.Dense(units=width, activation=activation, kernel_initializer=kernel_initializer)\
(input_layer)
weak_learner = tf.keras.layers.Dense(units=1, activation=tf.keras.activations.sigmoid)(weak_learner)
hidden.append(weak_learner)
output_layer = tf.keras.layers.Average()(hidden) # add an averaging layer at the end
return tf.keras.Model(input_layer, output_layer)
example_model = bagging_ensemble(inputs=30, width=10, weak_learners=5)
tf.keras.utils.plot_model(example_model)
The resulting model's plot looks like this:
However, training the model is slow, and because of the batch size of 1, a GPU doesn't really speed up the process. How can I make better use of the GPU when training such a model in TensorFlow 2, without using a larger batch size?
[The motivation for using this kind of ensemble is the following: Especially with small datasets, each of the small networks will yield different results because of different random initializations. By bagging as demonstrated here, the quality of the resulting model is greatly enhanced. If you're interested in the thorough neural network modelling approach this technique comes from, look for the works of H. G. Zimmermann.]
Topic training keras tensorflow ensemble-modeling neural-network
Category Data Science