after augmentation validation accuracy going down?

My main question is about augmentation.

if I process the augmentation I believe it always better than less data

but in my case the validation accuracy going down

train : 7000 images , validation: 3000 images : validation accuracy:0.89

train : 40000 images , validation: 17990 images : validation accuracy:0.85

my augmentation code

def data_augmentation_folder(trainImagesPath,saveDir):
    #X_train=load_training_data(trainImagesPath,"train")
    print("=====================================================")

    X_train = cleanData(trainImagesPath)
    X_train = np.array(X_train)
    print(X_train[0].shape)



    for i in range(5):

        #print(i)

        datagen = ImageDataGenerator(rotation_range=15,
                       width_shift_range=0.1,
                       height_shift_range=0.1,
                       shear_range=0.01,
                       zoom_range=[0.9, 1.25],
                       horizontal_flip=True,
                       vertical_flip=False,
                       fill_mode='reflect',
                       data_format='channels_last',
                       brightness_range=[0.5, 1.5])


        if i==1:
            datagen = ImageDataGenerator(
                       featurewise_center=True,
                                     featurewise_std_normalization=True,
                                     rotation_range=90,
                                     width_shift_range=0.1,
                                     height_shift_range=0.1,
                                     #zoom_range=0.2
            )
        if i==2:
            datagen = ImageDataGenerator(
                   featurewise_center=True,
                                 featurewise_std_normalization=True,
                                 rotation_range=100,
                                 width_shift_range=0.1,
                                 height_shift_range=0.1,
                                 #zoom_range=0.2
            )
        elif i==3:
            datagen = ImageDataGenerator(
                    rescale=1./255,
                    shear_range=0.2,
                    zoom_range=0.2,
                    horizontal_flip=True)
        elif i==4:
             datagen = ImageDataGenerator(
                    rescale=1./255,
                    shear_range=0.1,
                    rotation_range=80,
                    zoom_range=0.1,
                    horizontal_flip=True,
                    brightness_range=[0.5,1.5])




        datagen.fit(X_train)


        for x, y in datagen.flow(X_train, np.arange(X_train.shape[0]),shuffle=True, save_to_dir=saveDir,save_format='jpg',save_prefix='aug'):
            #print(y)
            assert x.shape[1:] == X_train.shape[1:]
            break

questions

  1. in which case the validation going down even though I proceeded augmentation?

  2. what you have to worry about when You proceed the augmentation?

Topic image-preprocessing data-augmentation keras image-classification

Category Data Science


...In my case the validation accuracy going down

That's normal since you're not doing an "apple to apple" comparaison. The actual performance of the model could have gone up or stayed the same, but it's not directly visible to you because you're using a different validation sets for each model. The model trained on the augmented dataset could be doing better at classifying the original images and doing okay on the augmented ones, and since the augmented images outnumber the original ones at almost 6 to 1 rate, the overall accuracy will suffer.

In general, your validation set should not undergo any augmentation in order for it to faithfully reflect what the "real world" data looks like, and for the validation metrics to be meaningful. It's also extremely hard to do comparaisons between models and benchmarks when you mess around with your validation set.


You can not compare validation accuracies if the validation datasets are fundamentally different.

What you were doing initially was to validate your model on a limited set of images. Augmentation is applied to make your dataset more representative of your deployment environment.

If in your deployment environment you expect images to be drawn from the same subspace as your augmented dataset, the validation accuracy of the first case will not be a true representative of your model's real world performance.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.