computer-vision

Class token in ViT and BERT

Shir

2022年6月4日 15:02

I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT. To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of the image. My question is — why does this token exist as input in all the transformer blocks and is treated the same as the word / patches tokens? Treating the class token …

Topic: attention-mechanism computer-vision deep-learning nlp machine-learning

Category: Data Science

False positive in Multi class Image classification

komal

2022年6月3日 12:09

I am training a neural network with some convolution layers for multi class image classification. I am using keras to build and train the model. I am using 1600 images for all categories for training. I have used softmax as final layer activation function. The model predicts well on all True categories with high softmax probability. But when I test model on new or unknown data, it predicts with high softmax probability. How can I reduce that? Should I make …

Topic: cnn keras image-classification computer-vision deep-learning

Category: Data Science

PCA, Better performances with 300 components rather than 400 components : why?

Valentin Fontanger

2022年6月3日 08:57

I am building this content based image retrieval system. I basically extract feature maps of size 1024x1x1 using any backbone. I then proceed to apply PCA on the extracted features in order to reduce dimensions. I use either nb_components=300 or nb_components=400. I achieved these performances (dim_pca means no pca applied) Is there any explanation of why k=300 works better then k=400 ? If I understand, k=400 is suppose to explain more variance then k=300 ? Is it my mistake or …

Topic: computer-vision dimensionality-reduction machine-learning

Category: Data Science

How to remove background (watermark) logo from image

amarchheda

2022年6月2日 14:04

I have been scratching my head for a while. What I have is a scanned PDF document with text and water marked logo at the back as in the below image. I want to do OCR over this, which becomes very difficult because of the logo. All the ratchet I've done so far is for coloured images where they can find contrast difference. I've hit a wall when solving the same for an B&W image as shown. Would love any …

Topic: image-preprocessing computer-vision deep-learning machine-learning

Category: Data Science

Training Inception V3 based model using Keras with Tensorflow Backend

Reuben_v1

2022年6月2日 11:01

I am currently training a few custom models that require about 12Gb GPU memory at the most. My setup has about 96Gb of GPU memory and python/Jupyter still manages to hog up all the gpu memory to the point that I get the Resource exhausted error thrown at me. I am stuck at this peculiar issue for a while and hence any help will be appreciated. Now, when loading a vgg based model similar to this: from keras.applications.vgg16 import VGG16 …

Topic: inception keras tensorflow computer-vision python

Category: Data Science

How "similarity" is measured in image retrieval?

David

2022年5月30日 23:00

I know what content based image retireval is. I have read this and this as one of them says: "given a query images, get a rank list that are most similar to the query image, based on the content of the query image. " But my question is how the "similar" images are determined. Assume we are working on Oxford5k dataset. The dataset contains 5k images in 17 classes. So, when I feed one of the images as a query, …

Topic: computer-vision information-retrieval machine-learning

Category: Data Science

Can we combine two models in which one was implemented in tensorflow and other one in pytorch?, to see the results of 2 models simultaneously?

Escanor6

2022年5月29日 07:39

To further explain my question. I am implementing 2 models. 1 is for action recognition and the 2nd is for weapon recognition. If there is a situation where a person is punching or kicking someone and carrying a weapon, my model should be able to detect the action and also a weapon, if that person is carrying any weapon in hand simultaneously. This can be useful for security purposes. So I want to combine these 2 models so that it …

Topic: keras tensorflow computer-vision deep-learning machine-learning

Category: Data Science

Labeling Images for object detection

Christian01

2022年5月25日 22:49

I want to train a model for object detection. How do I have to labeling the train data? Is it enough to label the class/content of each box in the image or do I have to add the box position additionally? Thank you

Topic: image-preprocessing object-detection computer-vision

Category: Data Science

CNN can't predict images outside the dataset

LOLs

2022年5月24日 04:04

I am using celeba dataset to train my CNN face landmark detection model. Here is my model class LandmarkModel: def __init__(self,inp_shape): self.model = models.Sequential() self.model.add(layers.Conv2D(16, (3, 3), activation='relu', input_shape=inp_shape))#l1 self.model.add(layers.Conv2D(32,(3, 3), activation='relu')) self.model.add(layers.MaxPooling2D((2, 2))) self.model.add(layers.Conv2D(64,(3, 3), activation='relu')) self.model.add(layers.Flatten()) self.model.add(layers.Dense(512)) self.model.add(layers.Dense(10)) def getModel(self): return self.model I have trained my model for around 5k-6k images with loss of 0.1. When I use image from dataset that is outside of training sample I get correct prediction. But when I use my own clicked …

Topic: cnn convolutional-neural-network computer-vision deep-learning

Category: Data Science

mAP should be calculated on validation set or on test set?

Andrea

2022年5月23日 17:01

I have a yolov3 model for object detection on 9 classes. What is the difference in computing metrics (such as mAP) on validation set and on a test set (unseen data)? What is usually done in literature and why?

Topic: object-detection computer-vision deep-learning machine-learning

Category: Data Science

Applying filters to custom objects in an image

Lizozom

2022年5月21日 16:29

I would like to create an application that adds image filters (Snapchat-style) to photos of cats or chairs (just for the sake of this question). In order to do that properly, I thought of using Active Shape Modelling algorithms to have a model to apply the filters to. I trained an object detection model to identify those items in an image (yolov5), so I now have a bounding box around each item, but I still don't know its exact shape …

Topic: opencv yolo computer-vision machine-learning

Category: Data Science

Cable angle measurement (rotation)

Demon

2022年5月21日 16:04

I need to detect the rotation of a cable (degree) in the x-axis with high precision [0.2 (or more) degree detection] from its original state. Detailed description: I have a cable that is set in its original state. The system has rotated the cable in the x-axis. I want to know the degree the cable has been rotated from its original state. Example: There're following images for a specific cable in different rotation (angle) [0, 0.4, 0.6, 0.8]: 1) 2) …

Topic: image-preprocessing opencv computer-vision neural-network

Category: Data Science

How to fetch text from pdf to further proceed with question answer based model from the same document?

Arijit Das

2022年5月21日 02:01

To illustrate the above title. Suppose you have a pdf document, which is basically scanned from hardcopy, now there are set of fixed questions to answer from the document itself. For an example a document contains a contract of land, now the set of fixed questions be "who is the seller?" "what is price of the asset? ", document has referred to this answers probably 2-3 times, as a human it's a simple task. How to automate this?

Topic: cnn computer-vision deep-learning nlp machine-learning

Category: Data Science

Using large CNNs (e.g., ResNet) in convolutional autoencoders for image representation learning

b19wh33l5

2022年5月20日 02:00

I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am …

Topic: vgg16 representation cnn autoencoder computer-vision

Category: Data Science

I need to plot only training curve in the fastai library using the learner.recorder.plot_losses() function . FASTAI devs pls help

Harshit Joshi

2022年5月18日 19:36

I have a task where I need to only plot the training loss and not the validation loss of the plot_losses function in the fastai library with learner object having recorder class, but I am not able to properly implement the same. I am using the fastai v1 for this purpose due to project restrictions. Here is the github code for the same: class Recorder(LearnerCallback): "A `LearnerCallback` that records epoch, loss, opt and metric data during training." def plot_losses(self, skip_start:int=0, …

Topic: fastai python-3.x computer-vision deep-learning machine-learning

Category: Data Science

How can I use a confusion matrix in image captioning?

Lei

2022年5月18日 19:31

I read that a confusion matrix is used with image classification but if I need to draw it with image captioning how to use it or can I draw it in the evaluation model phase for example if yes how can I start?

Topic: computer-vision confusion-matrix

Category: Data Science

Does YOLO give preference to color over shape or vice-versa while detecting an object?

Abhishek Singla

2022年5月17日 23:05

If you train your YOLO model only on grayscale images to detect car, then would it able to recognise a car in a colored image also. If so, then can I assume that YOLO consider only object shape not color? Kindly clarify.

Topic: coursera yolo cnn computer-vision deep-learning

Category: Data Science

the size of training data set in the context of computer vision

user288609

2022年5月17日 04:04

Generally speaking, for training a machine learning model, the size of training data set should be bigger than the number of predictors. For a neural network, or even a deep learning model, the number of parameters are usually tens of thousands or even millions. It seems that in practice, the number of training data set, i.e., the number of images, is usually less than the number of parameters. How to explain this? I know, we can claim that the pre-trained …

Topic: image-recognition image-classification computer-vision deep-learning neural-network

Category: Data Science

Graph Neural Networks for Segmented Images - Which Nodes do I connect?

Visioner

2022年5月16日 10:54

I'm facing an interesting problem involving medical images. We are set out to test an hypothesis if certain objects in an image affect the diagnosis of a patient. I would love to hear any comments regarding my pipeline but this is my current approach: Segment the image in order to obtain the object's regions. This would be done using off-the-shelf resnet and labeled data obtained from the manual annotation of the images in hand. Now, that I have the segmented …

Topic: graph-neural-network computer-vision deep-learning

Category: Data Science

MR images segmentation for feature extraction

gin

2022年5月16日 08:00

I have datasets of brain MR images with tumours, the tumours are already selected manually by a physicist using Image J. I have read about segmentation, but I still couldn't understand how do they extract features from a segmented image. should the images have only the tumor with a black background as shown in the below images, so the feature extraction will be processed on the whole image? or do they extract features only on the region of interest using …

Topic: computer-vision feature-extraction machine-learning

Category: Data Science

About