Why are axes-aligned bounding boxes used in object detection

I understand (I think) why in object detection, the result is a rectangle:

it is a simple shape that can be defined by 4 variables (2 pairs coords of opposite corners or 1 pair of coords + width and height)

So more complicated shape might require more parameters which could complicate things. But for example, what if a circle was used? There would just be 3 parameters, 1 pair of coordinates of the center + the radius. Is there is something obvious I am missing?

And still regarding the bounding box, I wonder what would happen if a 5-th parameter was added that would describe the angle of the bbox. For example, consider the iPhone on this image:

I might be thinking about it wrong, but in my head, the network could have an easier time understanding that we want it to detect a rotated bbox that alignes with the actual iPhone than an axes-aligned bbox. For a human, it's also easier to draw the rotated bounding box (and i would argue, more intuitive) than the axes-aligned, isn't it?

And regardless of whether it would be easier for the network, the rotated bounding box would be a more precise result of the detection.

Topic object-detection computer-vision neural-network

Category Data Science


There is some work on this, came across this blog.
https://developer.nvidia.com/blog/detecting-rotated-objects-using-the-odtk/

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.