I am training a VGG net on STL-10 dataset I am getting Top-5 validation accuracy about 98% and Top-1 validation accuracy about 83% But both the Top-1 and Top-5 Training accuracy is reaching 100% Does this mean that the network is over-fitting? Or not? Code:: def conv2d(inp,name,kshape,s): with tf.variable_scope(name) as scope: kernel = get_weights('weights',shape=kshape) conv = tf.nn.conv2d(inp,kernel,[1,s,s,1],'SAME') bias = get_bias('biases',shape=kshape[3]) preact = tf.nn.bias_add(conv,bias) convlayer = tf.nn.relu(preact,name=scope.name) return convlayer def maxpool(inp,name,k,s): return tf.nn.max_pool(inp,ksize=[1,k,k,1],strides=[1,s,s,1],padding='SAME',name=name) def loss(logits,labels): labels = tf.reshape(tf.cast(labels,tf.int64),[-1]) #print labels.get_shape().as_list(),logits.get_shape().as_list() cross_entropy …
In config file, VGG layer weights are initialized using this way: from easydict import EasyDict as edict MODEL = edict() MODEL.VGG_LAYER_WEIGHTS = dict(conv3_4=1/8, conv4_4=1/4, conv5_4=1/2) But how to initialize it using a parser? I have tried to do this the following way: parser.add_argument(’–VGG_LAYER_WEIGHTS’,type=dict, default=conv3_4=1/8, conv4_4=1/4, conv5_4=1/2, help=‘VGG_LAYER_WEIGHTS’) But got error. Please help me to write it correctly.
I want to use VGG16 (or VGG19) for voice clustering task. I read some articles which suggest to use VGG (16 or 19) in order to build the embedding vector for the clustering algorithm. The process is to convert the wav file into mfcc or plot (Amp vs Time) and use this as input to VGG model. I tried it out with VGG19 (and weights='imagenet'). I got bad results, and I assumed it because I'm using VGG with wrong weights …
I have medical images and need to extract features from the layer before the classification layer using VGG for example but the resolution of the images is not efficient... Are the features without improving this resolution will not be affected or do I need to improve the resolution before extracting the features? I was doing processing in color images for extracting the features using VGG by this processing preprocess = T.Compose([ T.Resize(256, interpolation=3), T.CenterCrop(224), T.ToTensor(), T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, …
I have merged two different models namely VGG16 and ResNet50 and given the outputs of the two models as input to another model. I have checked the Layers graph is correct. Before merging the code was running perfectly fine giving correct outputs. I am getting an error: "ValueError: Shapes (None, None) and (None, 7, 7, 3) are incompatible" on the line 6 ValueError Traceback (most recent call last) <ipython-input-36-620554d0106f> in <module>() 4 epochs = 200, 5 validation_data = validation_generator, ----> …
I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am …
I want to use the 3rd layer's output of the VGG16 network. The error is like below: UserWarning: Model inputs must come from `keras.layers.Input` (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to your model was not an Input tensor, it was generated by layer input_1. Note that input tensors are instantiated via `tensor = keras.layers.Input(shape)`. The tensor that caused the issue was: input_1:0 str(x.name)) Traceback (most …
Im currently doing a project about CNN's but im quite confused because they can be used to classify and to extract features. According to the Faster RCNN paper, it uses a ResNet backbone. I have also seen that you can use for example VGG16 with Faster RCNN to classify,lets say types of vegetables. Does it mean that when I implement it this way, it uses 2 cnn's in total, namely resnet for extracting features of ROI's and then VGG for …
My task is to cluster some images, I decided to use the VGG model to extract the features and then use K-Means to cluster these features. But my question: When I use a VGG as a feature extractor, I should make sure if the VGG model was trained on this type of images before, otherwise, the VGG model is not generalizable to all types of images, am I right? I am looking for a general method to cluster images regardless …
Now I am solving the problem of 3-class classification (in the task you need to understand who is in the picture - a panda, a cat or a dog). The dataset consists of 3000 images. To solve the problem, I use a slightly modified VGG architecture: After 200 epochs I got the following quality: In the problem, it is required to rich >= 85% quality on validation set. To be honest, I have no any special thoughts yet. Can you …
I would just like to get the class names of the predictions. I can get the class names on the images that I trained the model. But if I predict an image (say which is not trained but already belongs to the pre-trained model (VGG16) I cannot get its class name). My scenario: I used VGG16 pre-trained and added new datasets (like logos, stars, hills etc which are not present in VGG16 (say) ) Now I trained this model in …
I am using transfer learning to train a binary image classification model using keras' pretrained VGG16 model. The code can be found below : training_dir = '/Users/rishabh/Desktop/CyberBoxer/data/train' validation_dir = '/Users/rishabh/Desktop/CyberBoxer/data/validation' image_files = glob(training_dir + '/*/*.jpg') valid_image_files = glob(validation_dir + '/*/*.jpg') # importing the libraries from keras.models import Model from keras.layers import Flatten, Dense from keras.applications import VGG16 #from keras.preprocessing import image IMAGE_SIZE = [64, 64] # we will keep the image size as (64,64). You can increase the size for …
I'm trying to reproduce a research with greyscale images instead of colour images. I have found that there are pre-trained networks, like VGG16, with ImageNet. But that dataset has colour images, and I can't use it because I'm going to use greyscale images. Is there any pre-trained network with greyscale images? Failing that, I can also train a network with a greyscale image dataset but I can't find any.
I'm trying to implement VGG11 (Model A of Table 1 from this article) on the MINST dataset but I'm getting ~10% train & test accuracy (as bad as random guessing). I had to resize the MINST data from 28x28 to 32x32 to fit the CNN architecture. This is what I did: from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten from keras import optimizers, utils from PIL import Image, ImageFilter import numpy as np …
My model has been performing poor recently and I was wondering what are some things I can do bolster its performance. So far its training ACC is low, and validation is constant and not improving import copy import matplotlib.pyplot as plt import numpy as np import os from scipy.ndimage.filters import gaussian_filter from skimage.transform import rescale from sklearn.feature_extraction.image import img_to_graph import shutil import time import tensorflow as tf from tensorflow.compat.v1.losses import softmax_cross_entropy as SCE from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input from tensorflow.keras.callbacks …
I read the architecture of the model but this is the first time I try to use it . The calculations of the features map will be different if I extract the features from the two last layers or from the last layer but does it will affect if I used it in another model.
I found that batch_norm can cause problems with small batch sizes and that GroupNorm is a good alternative. Now, GroupNorm requires two parameters, the num_group and the num_channels. How can I choose a good value for num_group? On what depends it? And with groupnorm, is good a big batch_size or a small batch_size?
I've constructed a simple VGG16 layer model from the original Simonyan & Zisserman paper for use on a DBT (Digital Breast Tomo.) data challenge. As a starter model, I chose to make the 3x244x244 image patches via pre-processing (DICOM loading was pretty slow) and saved to disk. For a 3 label classification setup, with ~2000 image patches for training, it produces okay results (~80% accuracy). As part of the refinement, more patches were added with several data augmentations, which resulted …
I would like to create 3 different VGGs with a shared classifier. Basically, each of these architectures has only the convolutions, and then I combine all the nets, with a classifier. For a better explanation, let’s see this image: I have no idea on how to do this in Pytorch. Do you have any examples that can I study? Is this a case of weights sharing? Edit: my actual code. Do you think is correct? class VGGBlock(nn.Module): def __init__(self, in_channels, …