OpenCV vs YOLO Co-ordinates

June 13, 2023

I recently had to fine-tune around 100 Computer Vision AI models for an assignment I’m completing as part of my Master’s Degree.

OpenCV, to draw a bounding box on an image, expects the top left co-ordinate and the width and height of the box. My annotations however, were in the form of image colour maps, like so (image taken from the CamVid dataset):

Using some OpenCV code, I converted these segment maps to bounding boxes, generating an image like so instead (taking the cars only as an example):

All looked great up to this point.

Unfortunately I was getting pretty bad results from my AI models: we’re talking 4% mAP50. This is the mean average precision of the top 50% of classes that the model was attempting to draw bounding boxes around. Usually in detection models mAP50 isn’t expected to be very high however this result was abysmally low.

Only when I removed 10 of the 11 classes to attempt to train a detector on a single class did I realise that all of the labels were offset 50% to the left and 50% upwards.

This immediately triggered by spidey-senses: it was far too uniform to be a fluke. I eventually realised an important difference between OpenCV and the YOLO text annotation format.

To validate the bounding boxes were correctly I used OpenCV to visually confirm they were correct… after using OpenCV to generate those same bounding boxes. It turns out that OpenCV both expects and returns the following for bounding boxes:

Width and Height of the bounding box
The top left co-ordinate (x, y)

In constrast to this, the YOLO text annotation format expects:

Width and Height of the bound box
The centre co-ordinate (x, y)

The fact that the bounding boxes were all offset upwards and to the left is because to convert from centre to top left, one needs to perform the following calculation: top_left_x = centre_x - (width / 2).

Once I solved this issue and regenerated all of my annotations in the correct format, the mAP50 of my models jumped from 4% to 40%.

Lesson learned :)