Context We work on medical image segmentation. There are a lot of potential labels for one and the same region we segment. There can be different medically defined labels like anatomical regions, more biological labels like tissue types or spatial labels like left/right. And many labels can be further differentiated into (hierarchical) sub labels. Clarification The question is with respect to the number of classes / target labels which are used in a multi-label classification/segmentation. It is not about the …
i have this image which is an output from my object detection model i wanted to apply segmentation on this image so that my mask will be like that i used grabcut algorithm but the results was too bad here's my code img=cv2.imread(testpath+imgname) mask=np.zeros(img.shape[:2],np.uint8) bgModel=np.zeros((1,65),np.float64) fgModel=np.zeros((1,65),np.float64) tmpimage=image masks=[] for i in recs: cv2.grabCut(img,mask,i,bgModel,fgModel,5,cv2.GC_INIT_WITH_RECT) mask2=np.where((mask==2)|(mask==0),0,255).astype('uint8') masks.append(mask2) #img=image*mask2[:,:,np.newaxis] finalmask=np.zeros(img.shape[:2],np.uint8) for i in range(len(masks)): finalmask=finalmask+masks[i] # for i in range(len(finalmask)): # for j in range(len(finalmask[i][:])): # for k in recs: # if i<k[0] …
I'm building an OCR to read text off of water meters. I'm running into the error mentioned above when I try to fit the machine learning model. I am using the segmentation_models python library. BACKBONE = 'resnet34' preprocess_input = sm.get_preprocessing(BACKBONE) x_train, y_train, x_val, y_val = train_test_split(X,y, test_size = 0.2, random_state= 12345) x_train = preprocess_input(x_train) x_val = preprocess_input(x_val) model = sm.Unet(BACKBONE, encoder_weights='imagenet', encoder_freeze=True) model.compile('Adam', loss=sm.losses.bce_jaccard_loss, metrics=[sm.metrics.iou_score]) model.fit( x = x_train, y = y_train, batch_size=16, epochs=10, validation_data=(x_val, y_val)) 'X' represents the images …
I am new to resnet models. I want to implement a resnet50 model for semantic segmentation I am following the code from this video, but my numclasses is 21. I have a few questions: If i pass in any rgb jpeg image into the model, I get an output of size (1, 21). What does this output represent? Since I am doing semantic segmentation, my images dont have any rgb channels, so what should I put for image_channels in self.conv1? …
I've implemented a SegNet and SegNet ReLU variant in PyTorch. I'm using it as a proof-of-concept for now, but what really bothers me is the noise produced by the network. With ADAM I seem to get slightly less noise, whereas with SGD the noise increases. I can see the loss going down and the cross-evaluation accuracy rising to 98%-99% and yet the noise is still there. On the left is the actual image, then you can see the mask, and …
I am working at a semantic segmentation problem now, with 5-classes task. But when I running on validation function and output my probablities map. I found that with the background class (the extra class for nnUNet, named class-0), the probalities always up to nearly 1, even when running on much epochs. But the other foreground classes (as class-1 to class-6), probalities can't range to 1.0 the highest. But at least that I can recognize the outline of the target, but …
I am new to Deep Learning but have been able to use RasterVision successfully to predict building footprints within a set of aerial imagery. This aerial imagery data set is for a province of New Zealand. Now that I have a model that predicts successfully in this province, I am interested in how I could use this to predict in the many other regions of New Zealand. The problem is these regions are captured with differing sensors, resolution and with …
It's very straightforward for binary semantic segmentation: black color (0s) is responsible for background, whereas white color (1s) is responsible for objects of interest. But what about multiclass semantic segmentation? As far as I understand, these masks must be RGB images since we use more than two colors. Is it correct? Or should I have a separate binary mask for every class? If I can use RGB images with multiple colors as masks, should I use some specific colors for …
I'm working on 3d meshes dataset, i have to label it to train my deep learning model for a segmentation task like the picture shows. I spent days looking for a tool to label my 3d data but unfortunately i found nothing.. What tool i can use to label my data (3d mesh, .obj files) for segmentation task?
I am new to data science & working on a segmentation model, Basically I need to deploy this segmentation model in android devices using TensorFlow-Lite for real time camera frame segmentation. I used unet model to do that but could not get the accuracy I wanted. After exploring so much I found something about video segmentation but I am bit confuse How video segmentation is different from normal image segmentation? Can somebody explain the differences between these two?
I have samples images of stones present in the images. I need to identify the visible stones only. The approach which I tried is threshold based filtering and detecting cv2.contours. Also, I am looking into ENet Architecture for semantic segmentation based deep learning approach. The samples images are below. Example image1: Example image2: The code which I tried for contour based detection is as below image = cv2.imread(os.path.join(img_path, img_name2)) # threshold based customization lower_bound = np.array([0, 0, 0]) upper_bound = …
From here it says that Techniques to solve instance segmentation can be roughly grouped into two categories: proposal-based methods and proposal-free methods. In proposal-based methods, a set of object proposals and their classes are first predicted, then foreground-background segmentation in each bounding box is performed. The proposal-free approaches exclude the step of proposal generation. What is "proposal" in this context? Also, how to "first predict their classes"? There is not much explanation about this topic on the internet and I …
I'm trying to synthesize nailart on hand picture. Next 3 steps are what I'm trying to do. take hand pictures select options like color, cubic .. etc synthesize And the way I thought to solve this is get nail contour by trained UNET model with datasets (hand pics, hand pics with nail area painted) make synthetic nailarts image by trained pix-to-pix model with datasets( nailart pics, semantic images including nailarts' options) synthesize nailart image on hand picture I'm wondering whether …
I have original images of the size 1935x1481. I am using labelme to annotate the images. I am creating polygons on the original image. Is there a way to resize the image along with their mask? I am planning to use TFOD mask-rcnn and I know it will resize the image but what happens to the mask?
I'm very new to data science, and was admiring how people had made these massive open-source datasets, on places like kaggle. I noticed that all of the datasets where all in CSV format. I have lots of images that I'd like to upload to kaggle for everyone to use, although don't know how to convert my images to CSV. (I can't upload them as individual images because there is a limit of 1000 files, which is not enough for a …
I used a U-Net model that was built for Oxford Pet Segmentation to a crack segmentation project. Without transfer learning, model works fine for pet segmentation but not for crack segmentation. What could be the reason? I know there are codes for Crack Segmentation with U-Net but I want to learn why the code for pet doesn't work well for crack. Thanks in advance. def double_conv_block(x,n_filters): x=layers.Conv2D(n_filters,3,padding="same",activation="relu",kernel_initializer="he_normal")(x) x=layers.Conv2D(n_filters,3,padding="same",activation="relu",kernel_initializer="he_normal")(x) return x def downsample_block(x,n_filters): f=double_conv_block(x,n_filters) p=layers.MaxPool2D(2)(f) p=layers.Dropout(0.3)(p) return f,p def upsample_block(x,conv_features,n_filters): x=layers.Conv2DTranspose(n_filters,3,2,padding="same")(x) …
I am doing a lesion segmentation for multiple sclerosis (MS), and at the moment I am using a attention unet for my thesis. The best validation dice score I have recieved is 0.771 and train 0.84. I am thinking of doing some post processing for removing some FP and FN in order to enhance the predictions. Any advice? currently I am using opening and closing, and I am not sure if this is the right approach.
Background I am a PHD student trying to improve my data science. One of my research projects, has me tasked with determining the size of the clusters in a colored image of regions. Here is an example image I am using. The coloring is natural as it represents the orientation of the microscope light. The light hits the surface in different ways resulting in the different colors. But I'm not trying to sum regions of similar colors, but instead just …
I'm working on a project in which I need to build a form recognizer that, given a form image, returns de key - values pairs. As I just got started, I wanted to hear some opinions about what should I try. Some questions that I have in mind: What models works best for the refered input and output? What features should be fed into that model? What should be the ideal size of the training dataset? Please, feel free to …