ID-YOLO：驾驶员关注区域的实时显著目标检测

下载需积分: 0 | PDF格式 | 10.85MB | 更新于2024-08-04 | 28 浏览量 | 举报

"ID-YOLO：实时显著目标检测基于驾驶员的注视区域，为自动驾驶车辆或高级驾驶辅助系统(ADAS)提供重要的目标检测任务。同时引入了视觉选择性注意力的概念，这是驾驶员视觉系统中的关键神经机制，能快速过滤驾驶场景中的无关信息。现有的模型通常从计算机视觉角度检测所有驾驶场景中的物体，但在快速变化的驾驶环境中，检测驾驶员关注或与安全相关的显著物体对ADAS更有用。为此，构建了一个增强的眼动追踪对象检测(ETOD)数据集，该数据集基于多驾驶员眼动收集的驾驶视频。" 在自动驾驶和高级驾驶辅助系统(ADAS)领域，目标检测是一项至关重要的技术。它使车辆能够识别并理解周围环境中的物体，如其他车辆、行人、交通标志等，这对于安全驾驶至关重要。传统的计算机视觉方法往往试图检测图像中的所有物体，但这种方法在复杂的驾驶环境中可能效率不高，尤其是在快速变化的场景中。本文提出的ID-YOLO（驾驶员固定区域的实时突出目标检测）网络，旨在解决这一问题。ID-YOLO利用驾驶员的视觉注意力信息，即驾驶员的注视区域，来定位和识别关键和显著的目标。这种策略更加符合人类驾驶员的观察习惯，即驾驶员通常只关注与其驾驶行为密切相关的部分场景。视觉选择性注意力是人类大脑处理视觉信息的一种机制，允许我们在复杂环境中快速聚焦关键信息。在驾驶中，这种机制使得驾驶员能够忽略不重要的背景细节，专注于道路前方的危险或重要元素。ID-YOLO网络结合了这种生物学原理，通过眼动追踪技术获取驾驶员的注视点，从而确定驾驶场景中的显著区域，并在此区域内进行目标检测。为了训练和验证ID-YOLO，作者创建了一个名为ETOD（Eye Tracking Object Detection）的数据集。这个数据集包含多个驾驶员在驾驶过程中的眼动数据以及对应的视频帧，这些数据由第三方设备如DENGE等人所收集。这样的数据集不仅提供了丰富的驾驶场景，还包含了驾驶员的视线信息，使得模型能够学习到与驾驶安全密切相关的显著目标检测模式。通过使用ETOD数据集，ID-YOLO可以学习到如何根据驾驶员的视线调整其目标检测策略，从而提高在实际驾驶环境中的性能。实时性能是另一个关键因素，因为ADAS需要在短时间内做出决策，因此ID-YOLO的设计必须兼顾精度和速度。 ID-YOLO是一个创新的深度学习模型，它利用驾驶员的视觉注意力信息来提升驾驶场景中的目标检测效率，这在自动驾驶和ADAS技术中具有巨大的应用潜力。通过结合眼动追踪数据，该模型有望提供更精确、更及时的驾驶辅助，从而提高道路安全。

15900 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2022

Fig. 2. Comparison of objects detected by (a) YOLOv3 and (b) our proposed

model.

Faster Region-based Convolutional Network (Faster R-CNN)

[20], Mask R-CNN [21] and You Only Look Once

(YOLO) [22], [23]. Trained by datasets including all labeled

objects, these models can detect all objects appearing in trafﬁc

scenes in real time. The objects include cars, cyclists, trafﬁc

signs/lights, roads, pedestrians, sky, etc. Some image seg-

mentation methods have been used in commercial intelligent

driving vehicles that can detect and identify all objects and

areas that appear in the driving environment.

However, not all objects in driving scenes are critical and

necessary for driving safety. As shown in Fig. 2(a), YOLOv3

detected all static cars parked on the side of the road, pedes-

trians walking on the sidewalk, and some unrelated objects.

We hold the opinion that these static or unrelated objects may

be redundant information for driving safety. An assisted or

intelligent driving system may be distracted when too many

redundant objects are present. For example, both static cars

parked on the sidewalk in the second/third rows and the

running cars in opposite lanes in the third/fourth rows in

Fig. 2(a) are completely irrelevant objects for current driving

tasks, and they are redundant and disturbing information for

driving decision-making. By comparison, detecting the critical

objects that are closely related to the current driving situation

is more valuable, as shown in our proposed model in Fig. 2(b).

Furthermore, there are other state-of-the-art object detec-

tion works that were proposed to detect salient objects that

appeared in natural images. The detection results of these

works were binary images without bounding boxes. This can

be considered a kind of image segmentation. Guo et al. [24]

proposed a method to detect salient object regions in video via

object proposals. A deep learning model was proposed to efﬁ-

ciently detect salient regions in video [25]. Wang et al. [26]

presented a video salient object detection model based on

geodesic distance and applied it to unsupervised video seg-

mentation. In their follow-up work, the authors introduced

an attentive saliency network (ASNet) [27] that learned to

detect salient objects from ﬁxations. Song et al. [28] pro-

posed a fast video salient object detection model based on a

novel recurrent network. This was named the pyramid dilated

bidirectional ConvLSTM (PDB-ConvLSTM). Guo et al. [29]

proposed computationally efﬁcient and consistently spatiotem-

porally salient object detection in videos. Hu et al. [30]

explored possible ways to use visual attention (saliency) for

object detection and tracking, which only detect the vehicles.

However, these salient natural object detection models are not

suitable for trafﬁc driving scenarios.

C. Saliency Attention and Object Detection Dataset

Many image saliency datasets have been released in the

past few years, improving the understanding of human visual

attention and pushing computational models forward. The

statistics of saliency attention and object detection datasets

are summarized in Table I. There are some natural saliency

image/video datasets such as the MIT benchmark [31],

SLICON dataset [32], and Action in the Eye [38], but they do

not feature speciﬁc driving sequences. Wang et al. [35], [39]

built a large-scale benchmark called Dynamic Human Fix-

ation 1K (DHF1K) for predicting human ﬁxations during

dynamic nature scenes while free-viewing. DHF1K includes

1K video sequences annotated by 17 observers with an eye-

tracker device. In addition, the authors proposed a novel

video saliency model called the attentive CNN-LSTM network

(ACLNet). In DHF1K, each video was manually annotated

with a category label, which was further classiﬁed into 7 main

categories: daily activity, sport, social activity, artistic perfor-

mance, animal artifact and scenery. However, there are no

trafﬁc driving scenarios in this dataset. On the other hand,

there are many state-of-the-art object datasets that have been

published for object detection tasks. These datasets include

ImageNet [40], Pascal VOC [33] and MS COCO [34]. All

of the objects presented in the images have been labeled in

these datasets. These objects are very important in daily life

detection and tracking.

In the ﬁeld of driving attention dataset research,

Xia et al. [36] proposed an in-lab driver attention dataset

named Berkeley DeepDrive Attention (BDD-A), which was

built upon braking event videos selected from a large-scale,

crowd-sourced driving video dataset. Recently, Fang et al. built

a dataset to predict driver attention in driving accident scenar-

ios (DADA) [37] and designed a semantic context-induced

attentive fusion network (SCAFNet). Alletto et al. recorded

one driver’s eye tracking video during actual driving and built a

publicly available video dataset (DR(eye)VE) [9]. DR(eye)VE

is a good public dataset that consists of 74 videos and eight

Authorized licensed use limited to: DALIAN MARITIME UNIVERSITY. Downloaded on September 27,2022 at 06:45:53 UTC from IEEE Xplore. Restrictions apply.

剩余10页未读，继续阅读

想太多!

粉丝: 2775

ID-YOLO：驾驶员关注区域的实时显著目标检测

YOLO-TLA：基于YOLOv5的高效轻量级小目标检测模型

LF-YOLO：一种轻量且更快速的用于X射线图像焊接缺陷检测的YOLO

LC-YOLO：基于YOLO、LSTM和CNN的智能监控行为识别

华为Gold-YOLO：高效目标检测新突破

异步Fast-YOLO：实现实时目标检测

Fast-YOLO：目标检测的快速介绍

Fast-YOLO：高性能目标检测算法简介

华为 Gold-YOLO: 实时目标检测的新突破——融合与分布机制

Complex-YOLO：点云实时3D物体检测的革新方案

恶劣天气下IA-YOLO：智能目标检测提升准确性的关键

最新资源