Generate a balanced batch with ImageDataGenerator() and flow_from_directory()

Hi I am new to python and deep learning. I am doing a multiclass classification. My 3-classes dataset is imbalanced, the classes take about 50%, 40%, and 20%. I am trying to generate mini batches with balanced classes. I am using class_weight to generate a balanced batch in fit_generator() but I am doubting if it actually works because the batches generated by train_datagen.flow_from_directory() is not balanced. the generated batches have weights around [0.43, 0.38, 0.19]. My code are as follow:

train_datagen = ImageDataGenerator(rescale=1./255,
                                    featurewise_center=True,
                                    rotation_range=30,
                                    width_shift_range=0.3,
                                    height_shift_range=0.3,
                                    shear_range=0.2,
                                    zoom_range=0.2,
                                    horizontal_flip=True,
                                    fill_mode='constant')

#Training Set
train_set = train_datagen.flow_from_directory(
                                             directory=train_folder,
                                             target_size=input_shape[:2],
                                             batch_size=32,
                                             shuffle=True,
                                             class_mode='categorical')
#Validation Set
val_set = test_datagen.flow_from_directory(
                                            directory=val_folder,
                                            target_size=input_shape[:2],
                                            batch_size = 32,
                                            class_mode='categorical',
                                            shuffle=True)

call_backs = [EarlyStopping(monitor='val_loss', patience=train_patience),
             ModelCheckpoint(filepath=output_model, monitor='val_loss', save_best_only=True)]

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_set.classes), 
                train_set.classes)

history = model.fit_generator(
          train_set,
          steps_per_epoch=2000 // batch_size,
          epochs=300,
          validation_data=val_set,
           validation_steps=800 // batch_size,
           class_weight= class_weights,
           verbose=1,
           callbacks=call_backs)

Does this code enough to generate balanced batches for training? I checked the batches generated from the train_set they are around [0.43, 0.38, 0.19]. Any advice would be appreciated Thank you.

Topic multiclass-classification class-imbalance classification

Category Data Science


class_weight does not influence the composition of the batches. Instead, it applies a weight to the loss function that depends on the weight of the class.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.