images from training set are different from images of test set

Question

images from training set are different from images of test set

J.D.

2019年12月16日 05:28

I am doing image classification with CNN and I have a training set and a test set with different distributions. To try to overcome this problem I am thinking about doing a standardization using Imagegenerator, but I am encoutering some problems. Here is the part of the code I am working on:

trainingset = '/content/drive/My Drive/Colab Notebooks/Train'
testset = '/content/drive/My Drive/Colab Notebooks/Test'



batch_size = 32
train_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rescale = 1. / 255,\
    zoom_range=0.1,\
    rotation_range=10,\
    width_shift_range=0.1,\
    height_shift_range=0.1,\
    horizontal_flip=True,\
    vertical_flip=False)

train_datagen.fit(trainingset);

train_generator = train_datagen.flow_from_directory(
    directory=trainingset,
    #target_size=(256, 256),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="categorical",
    shuffle=True
)

test_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rescale = 1. / 255)

test_datagen.fit(testset);

test_generator = test_datagen.flow_from_directory(
    directory=testset,
    #target_size=(256, 256),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="categorical",
    shuffle=False
)

num_samples = train_generator.n
num_classes = train_generator.num_classes
input_shape = train_generator.image_shape

classnames = [k for k,v in train_generator.class_indices.items()]



print("Image input %s" %str(input_shape))
print("Classes: %r" %classnames)

print('Loaded %d training samples from %d classes.' % 
(num_samples,num_classes))
print('Loaded %d test samples from %d classes.' % 
(test_generator.n,test_generator.num_classes))

so, what I am trying to do is using in the Imag genereator the fields featurewise_center=True and featurewise_std_normalization=True to do standardization, but if I try to fit the generator to the trainingset by doing train_datagen.fit(trainingset); I get the following error:

    ValueError                                Traceback (most recent call last)
ipython-input-16-28e4ebb819be in module()
     23     vertical_flip=False)
     24 
--- 25 train_datagen.fit(trainingset);
     26 
     27 train_generator = train_datagen.flow_from_directory(

1 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, 
dtype, order)
     83 
     84     """
--- 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: '/content/drive/My Drive/Colab 
Notebooks/Train'

Can somebody please help me? Thanks in advance.

[EDIT] I am trying to adapt what is written here to my problem.

[EDIT_2] I think the problem is that .fit() takes as parameter a numpy array, while I am trying to pass to it a string, which is the path for the images.

But I don't understand now how to do, because I should transform this to a numpy array in order to do the fit.

Topic neural cnn image-classification machine-learning

Category Data Science

Danny · Accepted Answer · 2019年12月14日 06:34

The way you use ImageDataGenerator is wrong. The .fit() method is trying to read in the directory path which is a string. To be able to run your code, you should remove the train_datagen.fit(training_set) and your script should work fine and all the images should have been preprocessed the way you want them. Here's a link on how to use the ImageDataGen.

The .flow_from_directory() function is used for raw images. The .fit() method is used on numericals.

images from training set are different from images of test set

About