Generate a balanced batch with ImageDataGenerator() and flow_from_directory()
Hi I am new to python and deep learning. I am doing a multiclass classification. My 3-classes dataset is imbalanced, the classes take about 50%, 40%, and 20%. I am trying to generate mini batches with balanced classes. I am using class_weight
to generate a balanced batch in fit_generator()
but I am doubting if it actually works because the batches generated by train_datagen.flow_from_directory()
is not balanced. the generated batches have weights around [0.43, 0.38, 0.19].
My code are as follow:
train_datagen = ImageDataGenerator(rescale=1./255,
featurewise_center=True,
rotation_range=30,
width_shift_range=0.3,
height_shift_range=0.3,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='constant')
#Training Set
train_set = train_datagen.flow_from_directory(
directory=train_folder,
target_size=input_shape[:2],
batch_size=32,
shuffle=True,
class_mode='categorical')
#Validation Set
val_set = test_datagen.flow_from_directory(
directory=val_folder,
target_size=input_shape[:2],
batch_size = 32,
class_mode='categorical',
shuffle=True)
call_backs = [EarlyStopping(monitor='val_loss', patience=train_patience),
ModelCheckpoint(filepath=output_model, monitor='val_loss', save_best_only=True)]
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_set.classes),
train_set.classes)
history = model.fit_generator(
train_set,
steps_per_epoch=2000 // batch_size,
epochs=300,
validation_data=val_set,
validation_steps=800 // batch_size,
class_weight= class_weights,
verbose=1,
callbacks=call_backs)
Does this code enough to generate balanced batches for training? I checked the batches generated from the train_set they are around [0.43, 0.38, 0.19]. Any advice would be appreciated Thank you.
Topic multiclass-classification class-imbalance classification
Category Data Science