ICIoU：提升卷积神经网络边界框回归精度的关键策略

需积分: 41 125 浏览量更新于2024-08-05 2 收藏 1.54MB PDF 举报

卷积神经网络（Convolutional Neural Network, CNN）在计算机视觉领域中的目标检测器因其简洁高效的特性备受青睐。然而，目标检测器的性能高度依赖于其定位算法的精度，特别是损失函数的选择。传统的损失函数如IoU（Intersection over Union）在衡量预测框与真实框重叠程度时，可能无法充分考虑预测框与真实框的形状匹配。因此，本文提出了一种改进的损失函数——Improved CIoU (ICIoU)，它在原有Complete Intersection over Union的基础上进行了扩展。 ICIoU算法的关键在于它不仅考虑了预测框与真实框的面积重叠，还引入了预测框与真实框宽高比的匹配度。通过这种设计，当预测框与真实框的宽高比相同时，ICIoU能够更好地评估定位精度，从而强化了损失函数的惩罚作用，提高了模型的定位性能。相比于IoU，ICIoU在考虑形状匹配度的同时，有助于减少误报和漏报，尤其是在处理非矩形目标或目标部分遮挡的情况下。作者们在Udacity、PASCAL VOC和MS COCO等数据集上进行了实验验证。结果显示，使用ICIoU作为YOLOv4单级目标检测器的损失函数，能够显著提升模型在精度方面的表现。在Udacity测试开发上，AP(平均精度)提高了1.92%，AP75（75%置信度下的精度）提升了3.25%，显示出ICIoU在定位精度优化方面的明显效果。在PASCAL VOC上，AP也有1.7%的提升，进一步证实了ICIoU在实际场景下提升性能的能力。这项工作得到了陕西省工业自动化重点实验室的支持，研究结果表明，通过改进损失函数，可以有效提升基于CNN的目标检测器在复杂场景下的性能，对于深度学习和人工智能领域的研究具有重要意义。在未来的研究中，ICIoU可能会成为一种常用的损失函数选择，推动目标检测技术的进一步发展。

Received July 13, 2021, accepted July 22, 2021, date of publication July 26, 2021, date of current version August 3, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3100414

ICIoU: Improved Loss Based on

Complete Intersection Over Union

for Bounding Box Regression

XUFEI WANG

1,2

AND JEONGYOUNG SONG

, (Member, IEEE)

Key Laboratory of Industrial Automation, School of Mechanical Engineering, Shaanxi University of Technology, Hanzhong 723000, China

Department of Computer Engineering, Pai Chai University, Daejeon 35345, South Korea

Corresponding author: Jeongyoung Song (jysong@pcu.ac.kr)

This work was supported in part by the Shaanxi Provincial Key Laboratory of Industrial Automation Research Program under Grant

18JS020.

ABSTRACT An object detector based on convolutional neural network (CNN) has been widely used in

the ﬁeld of computer vision because of its simplicity and efﬁciency. The average accuracy of CNN model

detection results in the object detector is greatly affected by the loss function. The precision of the localization

algorithm in the loss function is the main factor affecting the result. Based on the complete intersection over

union (CIoU) loss function, an improved penalty function is proposed to improve the localization accuracy.

Speciﬁcally, the algorithm more comprehensively considers matching bounding boxes between prediction

with ground truth, using the proportional relationship of the aspect ratio from both bounding boxes. Under

the same aspect ratio of the two bounding boxes, the inﬂuence factors of the prediction box on localization

accuracy were considered. In this way, the function of the penalty function is strengthened, and localization

accuracy of the network model improved. This loss function is called Improved CIoU (ICIoU). Experiments

on the Udacity, PASCAL VOC, and MS COCO datasets have demonstrated the effectiveness of ICIoU

in improving localization accuracy of network models by using the one-stage object detector YOLOv4.

Compared with CIoU, the proposed ICIoU improved average precision (AP) by 0.57% and AP75 by 0.12%

on Udacity, AP by 0.26% and AP75 by 1.28% on PASCAL VOC, and AP by 0.06% and AP75 by 0.65% on

MS COCO.

INDEX TERMS Bounding box regression, localization accuracy, loss function, object detection.

I. INTRODUCTION

Object detection is one of the key problems in computer

vision tasks. In recent years, convolutional neural net-

works (CNNs) have been increasingly applied in the ﬁeld

of computer vision [1]–[11]. When using CNNs to solve the

problem of object detection, no matter whether a regression or

classiﬁcation problem, a loss function is indispensable. Loss

functions are used to estimate the degree of inconsistency

between the predicted value of a model and the real value.

The main task of model training in the present work is to

use the optimization method to ﬁnd the model parameters

corresponding to the minimization of the loss function. The

loss function determines what the optimal value of the model

is, so the performance of different object detectors is affected

The associate editor coordinating the review of this manuscript and

approving it for publication was Sudipta Roy .

by the loss function. The loss function generally consists of

bounding box regression and classiﬁcation. The loss calcula-

tion of bounding box regression is the key step of object loca-

tion, multiobject detection, target tracking, and instance-level

segmentation. In terms of multiobject detection, compared

with the traditional region proposal methods, a deep CNN

has better performance advantages in predicting the bounding

box of candidate objects. These networks include one-stage

object detectors such as the YOLO series [3]–[6] and single

shot multibox detector (SSD) [9], two-stage object detectors,

such as series of the regions with CNN features (R-CNN)

[11]–[14], and even multistage object detectors, such as cas-

cade R-CNN [15]. In these networks, intersection over union

(IoU) loss has become the most popular evaluation mea-

surement algorithm for bounding box regression compared

with focal loss and L

-norm (e.g. L

, L

) loss [16], [17].

However, IoU algorithm cannot detect the bounding box

105686

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

VOLUME 9, 2021

下载后可阅读完整内容，剩余9页未读，立即下载

wxf2020csdn

粉丝: 1

ICIoU：提升卷积神经网络边界框回归精度的关键策略

Python实现卷积神经网络图像识别应用

MATLAB卷积神经网络初学者实践案例

探索LeNet-5：经典卷积神经网络及其手写识别原理

卷积神经网络损失函数选用MSE的优点

基于孪生卷积神经网络与三元组损失函数的图像识别模型.pdf

卷积神经网络函数-用于数据预测-分类以及识别

纹理合成-使用卷积神经网络：论文的Tensorflow实现-“使用卷积神经网络进行纹理合成”

卷积神经网络中的激活函数与梯度下降

卷积神经网络激活函数

基于联合损失函数的卷积神经网络

最新资源