dataset split for image classification
I am trying to do image classification for 14 categories (around 1000 images for each cat). And i initially created two folders for training and validation. In this case, do I still need to set a validation split or a subset in a code? or I can use the whole files as train_ds and val_ds by deleting them
Folder names in the training and validation directory are same.
data_dir = 'trainingdatav1'
data_val = 'Validationv1'
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
                                                                data_dir,
                                                                validation_split=0.1, #is it required if I'm gonna use the whole folders and files for training?
                                                                subset=training,
                                                                seed=123,
                                                                image_size=(img_height, img_width),
                                                                batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
                                                              data_val,
                                                              validation_split=0.8, #need to check
                                                              subset=validation,
                                                              seed=455,
                                                              image_size=(img_height, img_width),
                                                              batch_size=batch_size)
num_classes = 14
model = tf.keras.Sequential([
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  
  layers.Conv2D(16, 3, padding='same', activation='softmax'),
  layers.MaxPooling2D(),
  
  layers.Conv2D(32, 3, padding='same', activation='relu'),  #from renu
  layers.MaxPooling2D(),
  
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(.2),             #prevent overfitting        
  layers.Flatten(),
  layers.Dense(128, activation='sigmoid'),
  layers.Dense(num_classes)
])
model.compile(optimizer='SGD', #adam
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()
epochs=50
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
Another question is the overfitting issue - validation accuracy is not over 0.4 and val_loss is around 2.xxx. Suggestions from Stacexchange are:
- Reduce the layers of the neural network.
 - Reduce the number of neurons in each layer of the network to reduce the number of parameters.
 - Add dropout and tune its rate.
 - Use L2 normalisation on the parameter weights and tune the lambda value.
 - If possible add more data for training.
 
Are there any other suggestions?
Topic validation overfitting image-classification dataset
Category Data Science