深度感知的双目视觉显著区域检测方法

75 浏览量更新于2024-08-26 收藏 534KB PDF 举报

"基于双目视觉的显着区域检测" 本文是一篇研究论文，主要探讨了如何利用双目视觉技术来检测图像中的显着区域。显着区域检测是模拟人类视觉系统的一种方法，它能快速聚焦于视觉环境中的吸引人或重要对象。在过去的几十年里，许多视觉注意力模型已经被开发并优化，但大多数集中在静态单目图像上，而对立体深度信息的利用则相对较少，而这恰恰是人类感知的重要组成部分。论文提出了一种考虑深度信息的区域基础双目显着性检测方法。首先，通过比较左右图像之间的差异来计算视差图，这有助于捕捉到物体的深度信息。接着，采用HSI（色度、饱和度、强度）颜色空间，因为这种颜色模型更能反映人眼对颜色的感知。然后，应用均值漂移算法进行图像分割，该算法能够自适应地寻找图像中的相似区域，并将它们分离出来。研究表明，所提出的基于区域的显着性计算方法能够有效地检测出显着区域。这种方法结合了深度信息和颜色特征，使得在复杂的视觉场景中，能够更准确地识别出具有显著性的目标。此外，通过利用立体视觉的深度信息，该方法还能够区分前景和背景，提高检测的准确性，这对于自动驾驶、机器人导航、视频监控等领域的应用具有重要意义。论文可能进一步讨论了实验结果，对比了与其他现有方法的性能，并可能提出了未来的研究方向，如如何提高计算效率，如何处理动态场景中的显着区域检测，以及如何将此方法扩展到其他视觉任务中。这项研究为双目视觉在显着性检测领域的应用提供了新的视角和理论支持，对于推动相关领域的发展具有积极的贡献。

Salient Region Detection Based on Binocular Vision

Zhong LIU Weihai CHEN Yuhua ZOU Xingming WU

School of Automation Science and Electrical Engineering

BeiHang University

Beijing 100191, China.

lzpro@126.com, whchenbuaa@126.com, chenyusiyuan@126.com

Abstract—Selective visual attention is a kind of mechanism of the

primate visual system for rapidly focusing on attractive objects

or regions in visual environment. Numerous visual attention

models have been developed and optimized over the past decades.

Most of the existing models concentrate on static monocular

image, but little attention has been devoted to stereo depth

information which is an important aspect of human perception.

A region-based binocular saliency detection approach

considering depth information is proposed in this paper. The

difference of left and right image is used for computing disparity

map and coarse saliency map. Hue, saturation, and intensity (HSI)

color space is adopted and mean-shift algorithm is used for image

segmentation. This study shows that the proposed region-based

saliency computational method can effectively detect salient

region, and it is more suitable for real time applications such as

obstacle detection and visual navigation for its simplicity.

Keywords- saliency;visual attention;binocular;segmentation;

I. I

NTRODUCTION

Selective visual attention is one of the most important and

effective mechanisms of primate visual system. It can be

considered as a biological process of selecting the most

valuable portion to operate from a large amount of visual

information. This remarkable function makes primate direct

their gaze to interesting things rapidly, such as fire, light, food

and some attractive regions. Since saliency is a crucial factor in

human visual tasks, it has long been a research topic of great

interest studied by researchers in physiology, psychology, and

neural systems. Although a large amount of effort has been

made, the underlying neural mechanisms of visual saliency

remain inexplicit. Some evidence illuminates the approximate

visual attention process in some sense. Visual information

proceeds along two parallel pathways including a dorsal stream

and a ventral stream. The former one is related to focusing

attention on regions or objects in a scene. The latter one is

responsible for identification and recognition tasks. Biological

visual selection is usually divided into two complementary

mechanisms. One is fast, pre-attentive, bottom-up visual

attention. The other is slower, top-down visual attention which

is task-dependent. In this paper, the rapid, saliency-driven,

bottom-up attention is considered.

Over the past decades, numerous visual attention

computational models have been proposed and many different

algorithms have been developed. These algorithms can be

broadly classified as biologically based and purely

computational, or a combination. Most of existing bottom-up

attention models construct saliency map to reflect the salience

of each key region in a scene. The model of Itti et al. [1] is

derived from a biologically plausible architecture which is

based on a neurobiology framework introduced by Koch and

Ullman [2]. Itti’s model computes saliency maps for features of

luminance, color, and orientation at different scales using the

feature integration theory. The various scales are then used to

perform center-surround operations [3] using a Difference of

Gaussians (DoG) approach. Then, the center-surround maps are

blended to produce two conspicuity maps, one aggregating

color and another aggregating intensity information. Finally,

these two maps are blended in a saliency map. For its definite

biological characteristic, Itti’s model has been widely

implemented in some fields, such as image compression, object

detection, and image segmentation, etc.

Achanta et al. [4] proposed a purely computational model

which computes local multiscale color and luminance feature

contrast to generate saliency map. Ma and Zhang [5] propose

an alternative local contrast-based model obtaining saliency

map by summing up the differences of image pixels with their

respective surrounding pixels in a small neighborhood, which

is not based on any biological model. Bruce and Tsotsos [6]ˈ

[7] uses Shannon’s self-information measure to compute visual

saliency which is based on information maximization theory

that represents a biologically plausible model of saliency

detection. Harel’s model[8] is graph-based, computing saliency

from distance-weighted multi-scale feature dissimilarity maps.

Guo and Zhang [9] introduce a model using the phase spectrum

of quaternion Fourier transform (PQFT). Each pixel of the

image is represented by a quaternion that consists of color,

intensity and motion features. Bian and Zhang[15] use spectral

whitening(SW) as a normalization procedure which represents

salient features and localized motion. This approach effectively

suppresses redundant background information and ego-motion

which reflects a principle of the human visual system.

Most of current visual attention models process monocular

image. The procedures proposed by these models are mostly

computationally expensive as the correlated processes carried

out in the brain are significantly complex. The majority of

previous research on visual saliency detection focused on

computing saliency of each pixel. It makes the result have low

resolution, poorly defined borders and expensive to compute.

For simplifying the computational process and optimizing the

salient region boundaries, a region-based approach is proposed

This work is supported by the National Nature Science Foundation of China

under Grant No.61075075, 61175108, and National High Technology Research

and Development Program of China under Grant No.2011AA040902

1862

978-1-4577-2119-9/12/$26.00

2011 IEEE

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38609401

粉丝: 5
资源: 936

深度感知的双目视觉显著区域检测方法

基于双目视觉与深度学习的番茄本体特征检测系统.pdf

基于双目视觉的障碍物高度检测

基于双目视觉的显著性目标检测方法

基于双目视觉的三维目标检测算法研究.docx

基于双目视觉的移动机器人障碍物检测研究

基于双目立体视觉的快速人头检测方法

基于双目视觉的显著性区域检测方法

基于双目视觉的彩色目标实时检测与定位技术

基于双目视觉的SLAM技术研究1

基于双目视觉的自动驾驶算法.pdf

最新资源