基于道路环境理解的多模态城市ADAS系统

需积分: 9 62 浏览量更新于2024-07-17 1 收藏 3.65MB PDF 举报

本文主要探讨了"基于道路环境理解的多模态高级驾驶辅助系统(A Multimodal ADAS System for Unmarked Urban Scenarios)"在日常城市交通中的应用。作者 Chunzhao Guo、Junichi Meguro 和 Yoshiko Kojima 以及 Takashi Naito 提出了一种创新的解决方案，针对未标记的城市道路（unmarked urban scenarios），旨在增强现有ADAS（Advanced Driver Assistance Systems）的功能，如车道保持辅助、自适应巡航控制和预碰撞系统。在这一系统的设计中，首要任务是利用立体视觉技术来识别物理道路边界和可能的车辆对象。这一步骤至关重要，因为未标记的道路缺乏清晰的路标，传统的ADAS系统可能无法准确识别。通过深度传感器和计算机视觉算法，系统能够有效地捕获并解析复杂的道路环境。随后，该系统对主机车、道路和周围车辆之间的上下文信息进行关联。这种关联不仅提高了低级别目标检测的精确度，比如对行人、其他车辆和障碍物的识别，而且有助于建立更高层次的路况模型。通过分析车辆与道路之间的动态关系，系统能够更智能地预测潜在的行驶风险，为驾驶员提供更全面的安全支持。最后，基于这些实时的上下文理解和分析结果，系统生成并执行所需的ADAS功能。例如，它能自动调整车道线跟踪，保持车辆在正确车道内行驶；根据前方车辆速度和距离自动调整车速，实现安全的跟驰控制；以及在可能的碰撞风险出现时，及时发出警告或采取紧急制动措施，减少事故发生的可能性。这项研究旨在填补未标记城市道路环境下ADAS系统的空白，通过集成多种感知模态（如视觉和道路环境理解），提升驾驶辅助的实用性和安全性，为驾驶者在复杂且难以标记的环境中提供更好的驾驶辅助体验。这种系统的发展对于提高道路交通效率和降低事故率具有重要的实际意义。

1692 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 16, NO. 4, AUGUST 2015

emergency lane generated by the proposed system provides an

alternative option of PCS with steering for more stopping room

or less speed loss.

The rest of this paper is organized as follows. Section II

discusses the related work in detail. In Section III, the detailed

descriptions of the proposed HMM-based road detection ap-

proach is presented. Section IV introduces the adopted vehicle

detection brieﬂy. In Section V, the proposed contextual cor-

relation for both low-level detection improvement and high-

level road structure estimation is described. Section VI gives

the experimental results on a variety of typical but challenging

road scenarios, which have demonstrated the effectiveness and

robustness of the proposed system. Finally, the conclusion is

drawn and the future work is addressed in Section VII.

II. R

ELATED WORK

ADAS is one of the fastest growth areas in automotive elec-

tronics. Since today, high-quality cameras come at a very low

price, many camera-based ADAS systems have been deployed

[5], [6]. In the proposed system, we also aim at a camera-based

solution f or the LKA, ACC, and PCS functions in unmarked

urban scenarios, which require the robust detections of road

[7]–[22] and vehicles [23]–[30] in the low level, and the rational

estimation of road structures [33]–[35] in the high level.

1) Road Detection: The problem of vision-based road de-

tection has been studied for several decades. Some methods

used a monocular camera to extract the road region by em-

ploying speciﬁc features based on the road appearance [7]–[12].

Such appearance-based methods can work very well in certain

environments, even with adverse conditions [12]. However,

they are characterized by lack of effectiveness in cases where

the r oads do not sufﬁciently correspond to the models of the

apriorideﬁned features. Some other methods worked on a

sequence of temporally consecutive monocular images of the

scene, and made use of the displacement of pixels between two

consecutive images [13], [14]. These motion-based methods

can provide generic detection of the drivable roads and give

information about the displacement of the target and structure/

depth of the scene. However, they cannot work well on chaotic

roads when the camera is unstable and the estimation of the

optical ﬂow is not robust enough.

Stereovision-based methods are also widely used for road

detection. Generally, they are more robust than the monocular-

based ones, since they have information such as triangulate

feature points in 3-D and are more robust to loss of scale and

dynamic vehicle movements. Given a stereo image pair, stereo

matching-based methods extract the 3-D structure of the scene

by solving the correspondence problem and computing the

disparity map. For example, 3-D urban reconstruction has been

demonstrated in [15] and [16]. Comparing with these rather

holistic methods, dedicated terrain traversability estimation

methods [17]–[21] showed a better classiﬁcation performance

with respect to the vehicle driving. The approach proposed here

belongs to this line of systems. Previously, we also developed

a road detection system in a Markov random ﬁeld (MRF) by

ﬁnding the correspondences of the road pixels between the

image pairs based on the homography induced by the ground

plane [22]. Compared with the plane-induced homography, the

disparity map can provide more detailed information, particu-

lalry for the low-textured scenarios, thereby more accuracy and

robustness of road detection can be expected.

2) Vehicle Detection: Vehicle detection is one of the top-

ics of great interest. Both the academic community and the

automobile industry have contributed to the development of

different types of detection systems in order to improve trafﬁc

safety with respect to the vehicle-to-vehicle collisions. For ex-

ample, Sun et al. [23] gave a comprehensive review for vehicle

detection. In the early works, the symmetry and edge informa-

tion were used for detecting vehicles in the image [24], [25].

However, such methods failed in more challenging scenarios,

where vehicles present dramatic appearance changes according

to camera viewpoints and environment conditions, and also

have intraclass variability. In order to tackle these challenges,

two common solutions have been developed in recent years.

One is to employ robust features, since overall performance

of the system depends on the discriminative power of features

used in the detection algorithm. For example, the HOG feature

[26] has been considered as one of the strongest features, which

captures the shape information of an object and is robust for

local variations. The other is to establish part-based models for

the target. Rather than trying to capture a global pattern of an

object with one template, part-based models focus on parts of

an object and, in consequence, provide more ﬂexible and robust

representations. Recently, Felzenszwalb et al. demonstrated a

DPM that outperformed the single template model by using

a latent support vector machine (SVM) formulation in com-

bination with a variation of HOG features [4]. This approach

works very well when the nearby vehicles are fully visible.

However, vehicles are sometimes far from t he host vehicle and,

consequently, the visual evidence is very weak. Furthermore,

vehicles are frequently occluded by other objects in trafﬁc

scenes. In this case, some part models are not visible, whereas

they will still count for the overall detection score. Thus, the

low scores of the occluded parts will result in a low summed

score, thereby generating false negatives.

It is hard to solve these problems by the generic methods

aforementioned, since the observations of the targets them-

selves are weak. In the vision community, researchers have

attempted to improve object detection by correlating the con-

textual information in the image. For example, Torralba [27] ex-

tracted the semantic categories of the image, such as a coastline,

a landscape, or a room, and learned the average positions of

objects of interest within the image. Such positions could then

be used as prior to object detection. Hoiem et al. [28] also clas-

siﬁed the image into three main spatial classes, namely ground,

vertical, and sky, and then trained a classiﬁer using AdaBoost

for object detection with a coarse viewpoint prior derived from

the spatial context. Galleguillos et al. [29] incorporated two

types of context, i.e., co-occurrence and relative location, for

object categorization by maximizing the object label agreement

in a conditional random ﬁeld. These methods showed obvious

detection improvement by using additional context information.

However, a major problem of these methods is that the context

of the objects of interest is learned from the labeled databases

comprising of images shot in a limited set of compositions.

剩余14页未读，继续阅读

黃詩婷

粉丝: 0
资源: 1

基于道路环境理解的多模态城市ADAS系统

多模态机器学习综述.zip

多模态语音情感语料库的 语言符号学“全文本”解读.pdf

The Multimodal Emotion Recognition Challenge of CCPR 2016

multimodal token fusion for vision transformers

unimodal models and multimodal models

Knowledge-driven Egocentric Multimodal Activity Recognition

a hybrid method for traffic flow forecasting using multimodal deep learning

exploration of deep learning-based multimodal fusion for semantic road scene

多模态在线哈希的国外研究现状

cmu multimodal data sdk下载

最新资源

多模态语音情感语料库的语言符号学“全文本”解读.pdf