Extract segment from document scan

I need to extract some "valuable" information from document scan. For example, document's number, incoming date, organizations, persons, etc. Example document: I'm trying to extract highlighted segment of the document. Original scan doesn't have that highlighting. And value can be handwritten or typewritten. I tried U-Net and Mask RCNN for my dataset (~100 examples). Without any success. Any ideas?
Category: Data Science

When using a model like VGG16 as a classifier within Faster RCNN, does Faster RCNN then use 2 CNNs in total?

Im currently doing a project about CNN's but im quite confused because they can be used to classify and to extract features. According to the Faster RCNN paper, it uses a ResNet backbone. I have also seen that you can use for example VGG16 with Faster RCNN to classify,lets say types of vegetables. Does it mean that when I implement it this way, it uses 2 cnn's in total, namely resnet for extracting features of ROI's and then VGG for …
Category: Data Science

Training Object Detection model on just 10 images

I am trying to train an object detection model using Mask-RCNN with Resnet50 as backbone. I am using the pre-trained models from PyTorch's Torchvision library. I have only 10 images that I can use to train. Of the same 10 images, I am using 3 images for validation. For the evaluation, I am using the evaluation method used in COCO dataset which is also provided as .py scripts in the TorchVision's github repository. To have enough samples for training, I …
Category: Data Science

Difference between RRPN and R2CNN

I have been working on rotated object detection with Faster R-CNN on aerial imagery for some time and encountered with two different approaches for producing rotated bounding boxes. The first approach is modifies RPN network of Faster R-CNN to produce inclined bounding boxes and then applying rotated bounding box regression to refine final boxes as explained here. The second approach is using RPN network for generating axis aligned boxes and adds an additional regression branch to classification head of Faster …
Category: Data Science

What is the difference between a bounding box and ROI (Region of Interest)

I was reading about the Fast RCNN for object detection. From what I understand, it uses pre-computed ROI's (using selective search) and uses these to predict the bounding box offsets and uses smooth L1 loss to refine these and get closer to the ground truth boxes. The paper states the following about the ROI's While training, R/N ROI's for each image (N=2,R=128) are taken where N are the images per mini batch. Among the ROI's chosen, around 25% of them …
Category: Data Science

Can I load my own weights?

Full code source: #Download COCO pre-trained weights !wget --quiet https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 !ls -lh mask_rcnn_coco.h5 COCO_WEIGHTS_PATH = "mask_rcnn_coco.h5" model.load_weights(COCO_MODEL_PATH, by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]) elif init_with == "last": # Load the last model you trained and continue training model.load_weights(model.find_last()[1], by_name=True) Can I load my own "*.h5" file? For example: I interrupted my kernel after 5 epochs. Can I load my last epoch? Can You explain it for me? It will be continue a process learning?
Category: Data Science

Custom Class Using PyTorch Faster-RCNN Model not working

I have been trying the pre-trained faster-rcnn resnet50 PyTorch model in my project, and when I define my function get_detection() as seen below within the same file as where I'm calling it, it works fine. The inference will work on any image I use as input. import torchvision from torchvision import transforms as T import os model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) model.eval() COCO_INSTANCE_CATEGORY_NAMES = [ '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign', …
Category: Data Science

Object Detection: Unusual warning while training Detectron2 Faster R-CNN

I am trying to train a Detectron2 faster_rcnn_R_50_FPN_3x model on a custom dataset, pretrained on PublayNet Dataset. While training, I am getting the following warning: WARNING [01/14 14:35:22 fvcore.common.checkpoint]: Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (7, 1024) in the checkpoint but (6, 1024) in the model! You might want to double check if this is expected. WARNING [01/14 14:35:22 fvcore.common.checkpoint]: Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (7,) in the checkpoint …
Category: Data Science

How to arrange ground-truth for anchor box representation in object detection

I am working on CharGrid and BERTGrid papers and have questions about bounding box regression decoder part. In the CharGrid paper, it states that there are two outputs from this branch: one with 2Na outputs and one with 4Na outputs. First one is for whether there is an object in bbox or not and the second one is for four bbox coordinates. Na is number of anchor boxes per pixel. I’ve got until this part. However, let’s say Na is …
Category: Data Science

Feature Map setup for Faster RCNN with resnet50 backbone

I'm trying to get an activation map using a Faster RCNN Resnet50 backbone, but am having issues getting the proper hook setup for output information. Most of the libraries, like gradcam, don't seem to have built-in support for faster rcnn setups. I think the flow for Faster RCNN requires something extra, but am unable to figure out what I need to hook into the model. Layer 4 is what I've concentrated on, as it's called out in numerous tutorials (which …
Category: Data Science

Which model is used for document extraction (CamScanner, Microsoft Lens etc)

I want to start a small project where I'd create a model(s) that would extract document from a picture and rescale it, something like CamScanner or Microsoft Lens apps do. I've gathered a small dataset just to prototype the concept, but I'm not sure what might be the best approach to label the data. Using bounding boxes - this might work best to locate the document, but it would bring some noise to it since the picture might be under …
Category: Data Science

Is it possible to pass in an empty annotation to signify just a background/negative image for faster RCNN?

I'm using a pretrained resnet50 for faster RCNN to detect areas with 2 classes (background and interest class). As part of my data inputs for training, I have background images without any annotation boxes. I've tried setting the box coordinates to all zeros and giving it an area of 0 also. When I go to train, it drops me out with the following error: All bounding boxes should have positive height and width. Found an invalid box [0.0,0.0,0.0,0.0] for target …
Category: Data Science

How to convert horizontal bounding box coordinates to oriented bounding box coordinates

I am trying to detect oriented bounding boxes with faster rcnn for a long time, but I could not make it to do so. I aim to detect objects in the DOTA dataset. I was using built-in faster rcnn model in pytorch, but realized that it does not support OBB. Then I found another library named detectron2 that is built on the pytorch framework. Built-in faster rcnn network in detectron2 is actually compatible with OBB but I could not make …
Category: Data Science

How to get the Feature visualization for pre-trained resnet50 models?

I'm trying to visualize some of the features from a pre-trained resnet50 FasterRCNN. The model downloaded is from torchvision: torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) The examples I've seen use VGG16, which has a much different architecture and can output visualizations of the filters. For the resnet, the layering is a bit different and I can't seem to get any features out. Any ideas on what needs to be tweaked from the simpler VGG example? I'm using this model to feed in images that have …
Category: Data Science

Pre trained dataset for Car damage detection

I'm making a Car Damage Detection model which would have 2 classes to detect upon. My dataset has a total of 300 images (out of which I'd be using some for testing), which are totally insufficient to train the model from scratch. Can I use a pre-trained model to train my dataset and detect images based on the 2 classes, and if yes, then which one should be the best on my problem set? P.S. - I would prefer to …
Category: Data Science

What kinds of changes can I attempt on my object detector .config file to improve the detection accuracy?

I have trained an object detection model with 2 classes, around 7500 images, and approx. 10,000 annotations per class. I was able to fine-tune Faster R-CNN with ResNet (V1) from the Tensorflow Object Detection API. As you can see from the green boxes, it was successful in detecting these plants that it had never seen before. However, there are still several other plants that it needs to detect (as shown by the drawn-in red boxes). Assuming my training data has …
Category: Data Science

What does the Region Proposal Network output in Faster-RCNNs?

Does it output corrections and offsets to the anchor boxes(that were generated by using some specific aspect ratios and scales)? Also if this the answer is YES, Suppose I have 3 scales - [8,16,32] and 3 aspect ratios - [0.5,1,2]. How is it trained to make sure that the first 4 outputs of the box regression layer(assuming the output is WH9*4) refers to the offsets/corrections of the anchor box with scale - 3 and aspect ratio 0.5?
Category: Data Science

Compute IoU for each class in Mask R-cnn

I'm trying to compute the IoU, with the matterport Mask R-cnn implementation, for each class (13 in total) that i have in my dataset. For now i managed to compute the average IoU for all the classes with this code: def compute_batch_ap(image_ids): APs = [] for image_id in image_ids: # Load image image, image_meta, gt_class_id, gt_bbox, gt_mask =\ modellib.load_image_gt(dataset, config, image_id, use_mini_mask= False) # Run object detection results = model.detect([image], verbose=0) # Compute AP r = results[0] AP, precisions, recalls, …
Category: Data Science

Does Fast-R-CNN model take into account the context?

Does Fast-R-CNN model take into account the local context and global context of objects in an image ? If it doesn't, is there any other models that does that and which is efficient in small object detection on images? Especially among those : https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.