IoU Loss Functions for Faster & More Accurate Object Detection

This blogpost post explores different loss functions in object detection which include GIoU, IoU, and CIoU loss functions.
IoU Loss Functions for Faster & More Accurate Object Detection

Object detection is one of the most important challenges in computer vision. Deep learning-based solutions can solve it very effectively. To solve any problem using deep learning, first, we need to model the problem as an optimization problem and then optimize it using some iterative optimization technique (e.g., gradient descent). The object detection loss function choice is crucial in modeling an object detection problem. Generally, object detection needs two loss functions, one for object classification and the other for bounding box regression. This article will focus on IoU loss functions (GIoU loss, DIoU loss, and CIoU loss). But first, we will gain an intuitive understanding of the loss function for object detection in general. 

The article is most beneficial to those:

  • Who wants to read and understand the paper- Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression, by H Rezatofighi et al.
  • Who want insights into the paper – Distance-IoU Loss: Faster and better learning for Bounding Box Regression, by Z Zheng et al.
  • Who wants a general understanding of the bounding box regression loss function for object detection.

In this article, we will discuss only regression loss for object detection. If you are not already familiar with object detection fundamentals and pipelines, the following article may be helpful:

Before going into the IoU-based loss function, let us look at the traditional regression loss function (MSE, MAE, combined) for object detection. Let us understand its pitfalls, thereby concluding that we need a better loss function.

MSE Loss Function

MSE Loss for Object Detection Regression - object detection loss function
MSE Loss for Object Detection Regression

In the above image, we can see that the prediction for the larger object is very close to the ground truth, and the prediction for the smaller object seems far from the ground truth. However, the distance of predictions from their ground truth is the same in both cases. This means that the loss and the gradient will be the same. Let’s look at the gradients of MSE and MAE loss functions.

Gradients of MSE and MAE Loss Functions

Let us plot MAE\text{-vs.-}\Delta{x} and MSE\text{-vs.-}\Delta x to understand loss and gradient.

MSE and MAE Loss Function and Its Gradients.
MSE and MAE Loss Function and Its Gradients

The above plot shows that loss and gradient is the only function of \Delta x. The gradient is fixed for a given \Delta x. So it implies that both predictions will have the same contribution to the weight update of the model. However, intuitively the smaller object prediction should have more contribution. But it is not the case here. It could be a better loss function. 

IoU Loss Functions

Why IoU-based loss functions are a better choice compared to MSE or MAE?

The mAP (mean average precision) metric in object detection is evaluated based on IoU (Intersection Over Union). Hence it is better to use the IoU-based loss function to achieve a better mAP.

Problem with Typical IoU Loss Function

IoU Loss Functions - Object detection loss function
IoU Loss Function

The problem with the IoU (above) loss function is that if there are no overlaps between ground truth and prediction, the IoU is zero, and the gradient is also zero –  why is the gradient zero?  

When IoU is zero, the loss will be 1 (one minus zero), which is constant. Hence with no change in loss, the gradient will be zero.

The gradient is zero in case of no overlap; this is not a good loss function because the initial predictions (during training) will likely be in that situation. 

Can we modify the above equation to use it as a proper loss function?

Yes. GIoU (Generalized Intersection over Union) has some modifications to repurpose the above equation as a better loss function. 

GIoU (Generalized IoU) Metric and Loss Function 

Before going into the GIoU metric/loss, look at the image below.

MSE or MAE vs. IoU or GIoU - Object detection loss function
MSE or MAE vs. IoU or GIoU

In the above image, there are two sets of examples, (a) and (b), with the bounding boxes represented by (a) two corners (x_1, y_1, x_2, y_2) and (b) center and size (x_c, y_c, w, h). For all three cases in each set, (a) l_2-norm distance, ||.||_2, and (b) l_1-norm distance, ||.||_1, between the representations of two rectangles are the same value, but their IoU and GIoU values are very different.

 It means that in the case of MSE or MAE, the quality of the bounding is not adequately represented by the loss function. This also states that MSE or MAE is not a good loss function for object detection regression.

As discussed earlier, the IoU does not distinguish between close and far predictions if it has no overlap with the ground truth. However, GIoU does that.  

The equation for GIoU is:

 GIOU = IoU - \frac{C \backslash (A \cup B)}{C}

Here, C is the smallest convex object that encloses A and B. However, the authors used the smallest rectangles that enclose both A and B.

Let us see how we can calculate C in the figure below.

How to Calculate C - GIoU loss function
How to Calculate C

Note that, 0 \leq IoU \leq1 but -1 < GIoU < 1.

GIoU loss:

 L_{GIoU} = 1 - GIoU = 1 - IoU +  \frac{C \backslash (A \cup B)}{C}

For IoU=0, the loss is not fixed; instead, it depends on how far the prediction and ground truth boxes are. So it is a better loss function compared to IoU. Because even if IoU is zero, it has a gradient to push the bounding box toward the ground truth.

The GIoU loss function is better than IoU and MSE. Here is a comparison table to back it up. 

GIoU Loss is Better than MSE and IoU Loss
GIoU Loss is Better than MSE and IoU Loss

The table is a performance comparison of YOLOv3 trained on the COCO dataset with different loss functions – (i) MSE, YOLOv3 original loss, (ii) IoU loss, and (iii) GIoU loss.

GIoU is a winner with a considerable margin.

DIoU (Distance IoU) Loss Function

Before going into detail about DIoU, look at the figure below.

IoU and GIoU vs. DIoU - object detection loss function
IoU and GIoU vs. DIoU

In the above figure, the green bounding box is the ground truth, and the red is a prediction.

We can see above that IoU and GIoU are the same for all three. However, DIoU is different. DIoU is the same as IoU and GIoU in the third (left to right), where prediction and ground truth centers are the same (DIoU is the minimum). It means DIoU loss pushes the prediction center to the ground truth center.

DIoU Loss Function

DIoU loss is defined as follows:

Distance in DIoU - Object detection loss function
Distance in DIoU

 L_{DIoU} = 1 - IoU + \frac{d^2}{C^2}

DIoU Loss Converges Faster Compared to IoU and GIoU Loss

Let us look at the effectiveness of DIoU loss compared to IoU and GIoU loss. 

DIoU Simulation Experiment - IoU loss functions - Object detection loss function
DIoU Simulation Experiment

Here is a simulation experiment. There are seven targets (green) with fixed areas and seven aspect ratios. There are 5000 points for anchor boxes around the target boxes, as shown in the above image. At each point, there are 49 anchors (seven sizes and seven aspect ratios). Each anchor has seven regression cases for seven targets. So at each anchor point, there are 7 x 7 x 7 regression targets.

IoU, GIoU, DIoU, and CIoU (discussed next) loss functions achieve these regression targets. The loss plot for different IoU loss functions is shown on the right side of the above image. DIoU and CIoU are much better compared to IoU and GIoU. The convergence of DIoU and CIoU is much faster than IoU and GIoU.

DIoU has Lower Losses Compared to IoU and GIoU

Have a look at the image below.

Regression Error at the Final Step at Every Anchor Box Coordinate - IoU loss functions - object detection loss function
Regression Error at the Final Step at Every Anchor Box Coordinate

The above plot is a regression error at the final step at every coordinate. 

Why are IoU losses high?

For IoU loss, points far from the target’s losses are high. This is expected because distant point prediction may not overlap with the targets that lead to zero IoU; hence no further optimization in the case of IoU loss. 

Why are GIoU losses high at the horizontal and vertical orientations?

Loss for GIoU loss function is better than IoU loss function. However, the horizontal and vertical orientation cases will still likely have significant errors. And the reason is in the image below.

GIoU Fails at Horizontal and Vertical Orientations - IoU loss functions - object detection loss function
GIoU Fails at Horizontal and Vertical Orientations

Loss for DIoU loss function is much better than GIoU loss function. It is because it does not depend on the orientation of the anchor box to ground truth.

CIoU (Complete IoU) Loss Function

This is an extension of DIoU loss. Additionally, which also accommodates deviation of aspect ratio. 

The CIoU loss function:

 L_{CIoU} = 1 -IoU + \frac{d^2}{C^2} + \alpha v

where,

 v = \frac{4}{\pi^2}(arctan\frac{w^{gt}}{h^{gt}} - arctan\frac{w}{h})^2

\alpha is a trade-off parameter and is defined as:

 \alpha = \frac{v}{(1-IoU) + v}

\alpha is a function of IoU. The above equation states that the aspect ratio factor is less important in the case of no overlap and more important in the case of more overlap.

NMS (Non-Maximum Suppression) using DIoU

Instead of IoU, DIoU can be used for NMS. DIoU-NMS can be formally defined as:

 s_i =     \begin{cases}       s_i, & IoU - \frac{d^2}{C^2} < \epsilon,\\       0, & \text{otherwise}     \end{cases}

If the two predicted bounding box overlap sufficiently, but their center is far, most probably, the bounding boxes belong to different objects.

IoU, GIoU, DIoU, and CIoU Loss Comparison

A quantitative comparison of YOLOv3 (Redmon and Farhadi) trained using L_{IoU} (baseline), L_{GIoU}, L_{DIoU}, and L_{CIoU}. (D) denotes-using DIoU-NMS. PASCAL VOC 2007 test set is used to report the result.

 

IoU vs. GIoU vs. DIoU vs. CIoU - Object detection loss function
IoU vs. GIoU vs. DIoU vs. CIoU

DIoU/CIoU Convergence Compare to GIoU

DIoU/CIoU vs. GIoU Convergence - IoU loss functions - object detection loss functions
DIoU/CIoU vs. GIoU Convergence

The first row is for GIoU, and the second is for DIoU. Green boxes are targets. Black boxes are anchors. Blue and red boxes are predictions for GIoU and DIoU loss, respectively.

We can see that convergence to target is faster for DIoU loss than GIoU loss.

Summary

  • CIoU loss function is better than GIoU and DIoU.
  • Nowadays, the CIoU loss function is commonly used for object detection regression. 
  • For NMS, using DIoU instead of IoU as the threshold is better. 
  • DIoU and CIoU have much faster convergence compared to GIoU.

Further Readings 

If you have completed this article and are interested in learning about object detection, here are a few excellent suggestions:

References

Read Next

VideoRAG: Redefining Long-Context Video Comprehension

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

AI Agent in Action: Automating Desktop Tasks with VLMs

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from…

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

 

Get Started with OpenCV

Subscribe To Receive

We hate SPAM and promise to keep your email address safe.​