What are the major architecture changes in the YOLO versions?

I came upon a recent blog on Medium that lists advancements in YOLOX over its predecessor YOLOv5.

The advancements are:

  1. YOLOX uses a decoupled head

Can someone please list down the major optimizations made over time on the YOLO architecture?

Topic yolo

Category Data Science


  1. YOLOv1 uses the Darknet framework trained on the ImageNet-100 dataset. The network has 24 convolutional layers followed by 2 fully connected layers. Since the system divides the input image into a 7 × 7 grid, it has problems detecting small objects if they appear as a cluster or in other dimensions different from the trained image.
  2. YOLOv2 uses a Region Proposal Network to identify objects from Image input and Single Shot Multibox Detector. It added other features such as Batch Normalization, Anchor Boxes, Multi-Scale Training, and so on.
  3. YOLOv3 uses a few tricks to improve training and increase performance, including Bounding Box Predictions, Class Predictions, Feature Pyramid Networks, and the Darknet-53 backbone classifier. Here, the Darknet-53 network works as a feature extractor that has 53 convolutional layers, it composes of mainly 3x3 and 1x1 filters with shortcut connections.
  4. The YOLOv4 architecture has 4 distinct blocks: The backbone, the neck, the dense prediction, and the sparse prediction. The backbone is the feature extraction architecture which is the CSPDarknet53. It is used to split the current layer into two parts, one to pass through convolution layers and the other that would not pass through convolutions, after which the results are aggregated. The neck helps to add layers between the backbone and the dense prediction block (head). The head (Dense prediction) is used for locating bounding boxes and for classification.
  5. YOLOv5 is a controversial approach because it was released a few days after YOLOv4. Since The author has not published any paper yet, Alexey Bochkovskiy cataloged it as YOLOv4 in other frameworks.
  6. YOLOX has improvements such as a decoupled head, anchor-free, and advanced label assigning strategy. I have not tested it, so I do not have comments about it.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.