dataset split for image classification

I am trying to do image classification for 14 categories (around 1000 images for each cat). And i initially created two folders for training and validation. In this case, do I still need to set a validation split or a subset in a code? or I can use the whole files as train_ds and val_ds by deleting them Folder names in the training and validation directory are same. data_dir = 'trainingdatav1' data_val = 'Validationv1' train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.1, #is …
Category: Data Science

Does high accuracy metrics with small (but equally sampled) dataset means a good model?

I have been training my CNN with 200 images per class for a classification problem. There problem is a binary classification one. And with the amount of test data ( 25 per class) I am getting good accuracy, precision and recall values. Does that mean my model is actually good?
Category: Data Science

False positive in Multi class Image classification

I am training a neural network with some convolution layers for multi class image classification. I am using keras to build and train the model. I am using 1600 images for all categories for training. I have used softmax as final layer activation function. The model predicts well on all True categories with high softmax probability. But when I test model on new or unknown data, it predicts with high softmax probability. How can I reduce that? Should I make …
Category: Data Science

How to determine the number of Neurons in each hidden layer and number of hidden layers for face recognition

I plan to build a CNN for face recognition using this Kaggle dataset. I tried building a model with a single hidden layer with 256 fully connected neurons, and it gave an accuracy of 45% after 55 epochs. Should I just set the no. of hidden layers (and the no. of neurons in the layers) as variables, and repeat the model evaluation process for various values of the variables to determine the optimum values? Or is there any other, more …
Category: Data Science

My CNN image classification model gives good predictions in all but 2 classes. What should I do?

I built a CNN image classifier for a dataset that contains 6 classes. The dataset is balanced in all 6 classes. After training, the model gives pretty good prediction accuracy in all but 2 classes. To elaborate further, let us label these 6 classes with integers from '0' to '5'. The trained model does well in predicting classes from '0' to '3'. But almost 5%-10% of class '4' image is predicted as class '5' and similarly, 5%-10% of class '5' …
Category: Data Science

Image Classification problem for minute defect detection

I am tasked with the problem of finding defects in a compressor wheel.Here is how a good wheel looks like: Here is how a defective wheel looks like ( I have drawn a box around the defective area): I have continuous video feed of the wheels rotating as a data set. I tried training the "goodness" of a wheel using a fasterrcnn_resnet50_fpn model in pytorch. But the results were inaccurate. This is what I fed in the training data with …
Category: Data Science

extract features from parts of one image

I have several parts of one image that have one caption... I need to do image captioning by evaluating every part of the image to which the caption will belong so do I need to extract the features from parts of the image and pass it to the model with its caption ? or how can I do it please? for example; the dataset I have are the parts of the image which are divided into three parts “beach, sea, …
Category: Data Science

Overfitting problem: high accurance and low accurancy validation for image classification

I want to define a model to predict 3 categories of images. I'm learnong on the field :-) I've 1500 images (500 for each category) in 3 directories. I've read in this blog many suggestions: use a simple loss function use droput use shuffle I've applied these tricks but the model still overfits ... This is the code I'm using, any suggestion? dim_x = 500 dim_y = 200 dim_kernel = (3,3) data_gen = ImageDataGenerator(rescale=1/255,validation_split=0.3) data_dir = image_path train_data_generator=data_gen.flow_from_directory( data_dir, target_size=(dim_x,dim_y), …
Category: Data Science

Are labels associated with a model or a dataset?

I'm not sure if I have this backwards or not, so I'll explain a bit of what is going on. I want to use Unity's Barracuda api to use an onnx model for classification and detection (depending on the model). Some example projects I've found have a model and labels handy, so it's easy to map the outputs to the labels. ie, if the 5th element has the highest score, i can look up the 5th label and find it …
Category: Data Science

Dictionary learning for image classification

I'm wondering if the approach I'm thinking of could even work. I want to use dictionary learning for image classification. The first step would be to learn the dictionary from a set of similar yet different images to be able to extract background from an image. For example, I have a set (e.g. 500 photos) of images of the same object, but the scenes differ (light, the angle the photo was taken at etc.) Basically, the main object is the …
Category: Data Science

CIFAR-100: What is the difference between vehicles 1 and vehicles 2?

The superclasses in the CIFAR-100 dataset are mutually exclusive and all but the vehicle ones are quite well defined by its label. Example: It is very clear why bees belong to the superclass insects and none of the other superclasses. This appears not to be the case for the two superclasses vehicles 1 and vehicles 2: The two appear not to be clearly separable if we ignore their subclass labels. Example: It is not clear why pickup trucks belong to …
Category: Data Science

Epoch 1/5 won't stop

When i run my code with 5 epochs, code gets stuck at first epoch and run continuesly. I tried applying various parameters but couldn't make it. here is my code... import tensorflow as tf import numpy as np import os from google.colab import drive from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow import keras from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D training_path = 'drive/My Drive/tesla' validation_path = 'drive/My Drive/validation' training_generator = ImageDataGenerator(rescale=1.0/255) validation_generator = ImageDataGenerator(rescale=1.0/255) training_set = training_generator.flow_from_directory(batch_size=2, directory=training_path, target_size=(150,150), class_mode='binary', …
Category: Data Science

Is it reliable to use TensorFlow (ML in general) to classify baggage bag tags based on the presence of a green stripe?

The images are identical except for the presence of the stripe on the side. I am trying to use a classify the images into 2 classes: greenStripe, noGreenStripe. I tried to use tensorflow retrain with a small dataset (~40 pictures in each class and batch size of 8) but the results where really bad. I am afraid to commiting to training using more data as it is time consuming. What do you suggest? Is there a better approach or does …
Category: Data Science

Designing a pretrained DNN for image similarity

I am pretty new to deep learning and really hope that you can help me. I want to write a python program that lets me choose an area in a reference image. This subimage of variable size should then be used to search in a database of images. Then the parts of the images with the highest similarity to the reference sub image should be given. However I have big problems with the sizing of the reference and database images. …
Category: Data Science

the size of training data set in the context of computer vision

Generally speaking, for training a machine learning model, the size of training data set should be bigger than the number of predictors. For a neural network, or even a deep learning model, the number of parameters are usually tens of thousands or even millions. It seems that in practice, the number of training data set, i.e., the number of images, is usually less than the number of parameters. How to explain this? I know, we can claim that the pre-trained …
Category: Data Science

Single image feature reduction at inference time : SVM

I am trying to train a SVM classifier using scikit-learn.. At training time I want to reduce the feature vector dimension. I have used PCA to reduce the dimension. pp = PCA(n_components=400).fit(features) features = pp.transform(features) PCA requires m x n dataset to determine the variance. but at the time of inference I have only single image and corresponding 1d feature vector.. I am wondering how to reduce feature vector at inference time in order to match the training dimension. Or …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.