Class token in ViT and BERT

I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT. To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of the image. My question is — why does this token exist as input in all the transformer blocks and is treated the same as the word / patches tokens? Treating the class token …
Category: Data Science

False positive in Multi class Image classification

I am training a neural network with some convolution layers for multi class image classification. I am using keras to build and train the model. I am using 1600 images for all categories for training. I have used softmax as final layer activation function. The model predicts well on all True categories with high softmax probability. But when I test model on new or unknown data, it predicts with high softmax probability. How can I reduce that? Should I make …
Category: Data Science

PCA, Better performances with 300 components rather than 400 components : why?

I am building this content based image retrieval system. I basically extract feature maps of size 1024x1x1 using any backbone. I then proceed to apply PCA on the extracted features in order to reduce dimensions. I use either nb_components=300 or nb_components=400. I achieved these performances (dim_pca means no pca applied) Is there any explanation of why k=300 works better then k=400 ? If I understand, k=400 is suppose to explain more variance then k=300 ? Is it my mistake or …
Category: Data Science

How to remove background (watermark) logo from image

I have been scratching my head for a while. What I have is a scanned PDF document with text and water marked logo at the back as in the below image. I want to do OCR over this, which becomes very difficult because of the logo. All the ratchet I've done so far is for coloured images where they can find contrast difference. I've hit a wall when solving the same for an B&W image as shown. Would love any …
Category: Data Science

Training Inception V3 based model using Keras with Tensorflow Backend

I am currently training a few custom models that require about 12Gb GPU memory at the most. My setup has about 96Gb of GPU memory and python/Jupyter still manages to hog up all the gpu memory to the point that I get the Resource exhausted error thrown at me. I am stuck at this peculiar issue for a while and hence any help will be appreciated. Now, when loading a vgg based model similar to this: from keras.applications.vgg16 import VGG16 …
Category: Data Science

How "similarity" is measured in image retrieval?

I know what content based image retireval is. I have read this and this as one of them says: "given a query images, get a rank list that are most similar to the query image, based on the content of the query image. " But my question is how the "similar" images are determined. Assume we are working on Oxford5k dataset. The dataset contains 5k images in 17 classes. So, when I feed one of the images as a query, …
Category: Data Science

Can we combine two models in which one was implemented in tensorflow and other one in pytorch?, to see the results of 2 models simultaneously?

To further explain my question. I am implementing 2 models. 1 is for action recognition and the 2nd is for weapon recognition. If there is a situation where a person is punching or kicking someone and carrying a weapon, my model should be able to detect the action and also a weapon, if that person is carrying any weapon in hand simultaneously. This can be useful for security purposes. So I want to combine these 2 models so that it …
Category: Data Science

CNN can't predict images outside the dataset

I am using celeba dataset to train my CNN face landmark detection model. Here is my model class LandmarkModel: def __init__(self,inp_shape): self.model = models.Sequential() self.model.add(layers.Conv2D(16, (3, 3), activation='relu', input_shape=inp_shape))#l1 self.model.add(layers.Conv2D(32,(3, 3), activation='relu')) self.model.add(layers.MaxPooling2D((2, 2))) self.model.add(layers.Conv2D(64,(3, 3), activation='relu')) self.model.add(layers.Flatten()) self.model.add(layers.Dense(512)) self.model.add(layers.Dense(10)) def getModel(self): return self.model I have trained my model for around 5k-6k images with loss of 0.1. When I use image from dataset that is outside of training sample I get correct prediction. But when I use my own clicked …
Category: Data Science

Applying filters to custom objects in an image

I would like to create an application that adds image filters (Snapchat-style) to photos of cats or chairs (just for the sake of this question). In order to do that properly, I thought of using Active Shape Modelling algorithms to have a model to apply the filters to. I trained an object detection model to identify those items in an image (yolov5), so I now have a bounding box around each item, but I still don't know its exact shape …
Category: Data Science

Cable angle measurement (rotation)

I need to detect the rotation of a cable (degree) in the x-axis with high precision [0.2 (or more) degree detection] from its original state. Detailed description: I have a cable that is set in its original state. The system has rotated the cable in the x-axis. I want to know the degree the cable has been rotated from its original state. Example: There're following images for a specific cable in different rotation (angle) [0, 0.4, 0.6, 0.8]: 1) 2) …
Category: Data Science

How to fetch text from pdf to further proceed with question answer based model from the same document?

To illustrate the above title. Suppose you have a pdf document, which is basically scanned from hardcopy, now there are set of fixed questions to answer from the document itself. For an example a document contains a contract of land, now the set of fixed questions be "who is the seller?" "what is price of the asset? ", document has referred to this answers probably 2-3 times, as a human it's a simple task. How to automate this?
Category: Data Science

Using large CNNs (e.g., ResNet) in convolutional autoencoders for image representation learning

I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am …
Category: Data Science

I need to plot only training curve in the fastai library using the learner.recorder.plot_losses() function . FASTAI devs pls help

I have a task where I need to only plot the training loss and not the validation loss of the plot_losses function in the fastai library with learner object having recorder class, but I am not able to properly implement the same. I am using the fastai v1 for this purpose due to project restrictions. Here is the github code for the same: class Recorder(LearnerCallback): "A `LearnerCallback` that records epoch, loss, opt and metric data during training." def plot_losses(self, skip_start:int=0, …
Category: Data Science

the size of training data set in the context of computer vision

Generally speaking, for training a machine learning model, the size of training data set should be bigger than the number of predictors. For a neural network, or even a deep learning model, the number of parameters are usually tens of thousands or even millions. It seems that in practice, the number of training data set, i.e., the number of images, is usually less than the number of parameters. How to explain this? I know, we can claim that the pre-trained …
Category: Data Science

Graph Neural Networks for Segmented Images - Which Nodes do I connect?

I'm facing an interesting problem involving medical images. We are set out to test an hypothesis if certain objects in an image affect the diagnosis of a patient. I would love to hear any comments regarding my pipeline but this is my current approach: Segment the image in order to obtain the object's regions. This would be done using off-the-shelf resnet and labeled data obtained from the manual annotation of the images in hand. Now, that I have the segmented …
Category: Data Science

MR images segmentation for feature extraction

I have datasets of brain MR images with tumours, the tumours are already selected manually by a physicist using Image J. I have read about segmentation, but I still couldn't understand how do they extract features from a segmented image. should the images have only the tumor with a black background as shown in the below images, so the feature extraction will be processed on the whole image? or do they extract features only on the region of interest using …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.