I had a setup a yolo4 pytorch framework in google colab by cloning git clone https://github.com/roboflow-ai/pytorch-YOLOv4.git. I generated checkpoints by giving training. As we need more robust training model, I given training again with assigning pretrained checkpoints but it seems loss started with high value as like first time training. Code is for training !python train.py -b 2 -s 1 -l 0.001 -g 0 -pretrained ./Yolov4_epoch100_latest.pth -classes 1 -dir ./train -epochs 100. Not sure if my pretrained checkpoint is used …
I want to detect weather the car is moving forward or backward in the video, i am able to detect the bounding box on the car, now i want to do the post processing and tell that this car is moving forward or backward ( backward here means that car is reversing )
I would like to create an application that adds image filters (Snapchat-style) to photos of cats or chairs (just for the sake of this question). In order to do that properly, I thought of using Active Shape Modelling algorithms to have a model to apply the filters to. I trained an object detection model to identify those items in an image (yolov5), so I now have a bounding box around each item, but I still don't know its exact shape …
If you train your YOLO model only on grayscale images to detect car, then would it able to recognise a car in a colored image also. If so, then can I assume that YOLO consider only object shape not color? Kindly clarify.
I need to detect objects from multiple video streams at realtime (or close to it, like 10 FPS). How many GPUs do I need to detect objects using YOLOv3 or MobileNet for, say, 10 video streams? Is it possible to use CPU or something else? I don't need an exact number. I just need to understand scalability perspective and costs per single stream.
I'm studying Andrew NG's Convolutional Neural Networks and am in Week 3 of the course which deals with object detection using YOLO algorithm . I don't understand one section in the programming assignment that uses a function called 'scale_boxes' . This is what is described about the function in the course materials. "*There're a few ways of representing boxes, such as via their corners or via their midpoint and height/width. YOLO converts between a few such formats at different times, …
Context: I'm trying to utilize an object detection model (YOLOv5) to detect damage/defects on cars (dents, scratches, cracks). Right now the goal is to make a minimum viable prototype, a model able to detect these defects on static images; if successful this might be utilized for a real-time quality control use case in a car manufacturing plant. Problem: For pilot test, I used a really small image dataset (~100 images) from Kaggle, whose bounding boxes I've labeled myself. With these …
Problem Statement: I am given 2 sets of images. All the images in both sets are without annotations and labels. First set : a set of images of the grocery store shelves (captured in the grocery stores). Second set: a set of close-up images of the products kept on those store shelves. What I am trying to achieve: I want to first locate and then predict a bounding box Product for a Product in the set of images of Grocery …
I wish to accomplish the following task in PyTorch- I have the COCO dataset, wherein each data sample is used in training YOLO v3. After being processed by the model, the sample is to be deleted if it satisfies a certain condition. The data sample is thus no longer used for training in further epochs. I now have two questions regarding implementation - 1) How do I process each sample individually? Do I go about this by setting batch size …
I am going to write yolov4 real-time object detection, and I have to do it for car then vehicle plate number, but it does not have to find plate number if there is no car, first car then number on car, is that possible? Is it okay to use the darknet framework?
I’m implementing YOLOv1 paper from scratch using PyTorch, I managed to implement the model and define its loss function correctly, and I trained it, it converges and doing very well. The thing is I want to calculate the map(mean average precision) of the model which I’m stuck on, because I don’t know how to calculate the true positives and false positives from the predictions in a reasonable way. So any help on how to compute the true and false positives …
I have two questions about dense prediction in YOLOv4 paper What does it mean by the (hard negative, online hard) example mining method is not applicable to one-stage object detector, because this kind of detector belongs to the dense prediction architecture ? Why dense prediction does not belong to two-stage detector ?
For what concerns training, validation and test dataset I think there is a little bit of confusion in literature terminology. I'm using this repo to train a model on custom 9 classes: https://github.com/ultralytics/yolov3 I don't understand if here the validation set is used in the training process (i.e. tune hyperparameters, etc..) or is used only to calculate some metrics (so as a "test" set of UNSEEN data). Could anyone help me? Thank you.
I trained a model with my dataset for object detection - using 1500 samples. Now I'm not pretty sure how to benchmark my model. What is the procedure before using the model? Are the parameters in the output below reliable? Or should I test my model on another separate dataset? I want to be sure that my model is good enough before using it. I got the following results: wandb: Run history: wandb: metrics/mAP_0.5 ▁▅▆▆▇▇▇▇▇▇▇▇████████████████████████████ wandb: metrics/mAP_0.5:0.95 ▁▄▅▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇██████████████████████ wandb: metrics/precision …
After reading some object detection papers (kinds of R-CNN, YOLO,...) I'm wondering if there is a detector that detects objects about not trained class. For example my model is not trained to detect buildings, but is it possible to make my model detect building or some vehicles just with pixel data?
I came upon a recent blog on Medium that lists advancements in YOLOX over its predecessor YOLOv5. The advancements are: YOLOX uses a decoupled head Can someone please list down the major optimizations made over time on the YOLO architecture?
I am running a YOLOv5 detector on the below video to detect persons in the stream. It is giving me satisfactory response. I need to know if I should train the model on my custom dataset, or continue to use the standard weights given in the Ultralytics Github repo. When should one train the model on their custom data? Also how many annotated images should I arrange if I want to train on my custom dataset?
We all know the CTRL+F "Find text..." feature in text editors / browsers. I'd like to study the available algorithms to do something similar on an image. Example of UI/UX: let's say you have an electronic schematic. You draw a rectangle around a diode, and then the algorithm will automatically find all similar diodes on this image. In order to find a pattern on an image, I know (and have already used some of them) the classical tools: openCV matchTemplate …
The task is to detect rotated alphanumeric characters embedded on colored shapes. We will have an aerial view of the object (from a UAS: Unarmed Aerial System), something of this sort: (One Uppercase alphabet/number per image). We have to report 5 features: Shape, Shape color, alphanumeric, alphanumeric color, and alphanumeric orientation. Right now, I am focusing on just detecting the alphanumeric and the shape. Using OpenCV, I have created a sample image by embedding (shape+alphanumeric) image on an aerial view …