978-1-4673-0311-8/12/$31.00 ©2012 IEEE
Regions of Interest Extraction Based on HSV
Color Space
Zhong LIU Weihai CHEN Yuhua ZOU
School of Automation Science and Electrical Engineering
BeiHang University
Beijing 100191, China.
lzpro@126.com, whchenbuaa@126.com,
chenyusiyuan@126.com
Cun HU
China Satellite Maritime Tracking and Controlling
Department
No. 575, Binjiang Middle Road,
Jiangyin, Jiangsu 214431, China
Abstract—In this paper, a simple method to extract regions of
interest (ROI) from images is proposed. In the field of image
processing, intensity, color and orientation are commonly used
features for saliency map generation in most visual attention
model. However, texture feature can contribute to the guidance
of attention in a bottom-up model. We consider texture contrast
as a component of final saliency map. Hue, saturation, and value
(HSV) color space is also adopted in this paper for its good
capability of representing the colors of human perception and
simplicity of computation. Moreover, binocular stereo image pair
is adopted as source image. The result shows that the proposed
saliency computational method can effectively detect salient
region, and it is more suitable for environmental perception and
cognition, object detection, and mobile robot navigation.
Keywords—saliency, regions of interest, visual attention,
segmentation
I. Introduction
Visual information is one of the most important factors for
human environment perception. And most of daily behaviors
are based on visual system, such as obstacle detection, object
searching, danger avoidance and so on. In the past few
decades, the research on mobile robot had received more and
more concentration for its various applications. Visual sensor
is broadly applied in the field of autonomous mobile robot for
its ability to collect abundant information at a low price. But a
mass of visual information will consume a mass of storage
space and reduce processing speed. This deficiency limit the
application of visual sensor in a sense, especially for some
real-time systems. Primates have the ability to focus their
attention on interesting objects rapidly in complex natural
environment. This mechanism is called visual selective
attention, a biological process of selecting the most valuable
portion to operate from a large amount of visual information.
This remarkable function makes primates direct their gaze to
particular objects without browsing the whole visual scene.
This unique mechanism shines light on the question of process
a significant amount of visual information within restrict time.
Since saliency is a crucial factor in human visual tasks, it has
long been a research topic of great interest studied by
researchers in physiology, psychology, and neural systems.
Although a large amount of effort has been made, the
underlying neural mechanisms of visual saliency remain
inexplicit. But some evidence illuminates the approximate
visual attention process. A number of regions of the brain
participate in visual attention. Visual information proceeds
along two parallel pathways including a dorsal stream and a
ventral stream. The former one is related with focusing
attention on regions or objects in a scene. The latter one is
responsible for identification and recognition tasks. Biological
visual selection is usually divided into two complementary
mechanisms. One is fast, pre-attentive, stimulus-driven,
bottom-up visual attention. The other is slower, goal-directed,
top-down visual attention which is task-dependent. In this
paper, low level bottom-up visual attention is considered.
Over the past decades, numerous of visual attention models
have been proposed. Most of existing bottom-up attention
models adopt low level features like intensity, color and
orientation as attention cues. Itti et al. [1] develop a visual
attention model based on a biologically plausible
architecture of the early primate visual system. Itti’s
model computes saliency maps for features of luminance,
color, and orientation at different scales using the feature
integration theory of Treisman and Gelade[8]. The various
scales are then used to perform center-surround operations [3]
using a Difference of Gaussians (DoG) approach. Then, the
center-surround maps are blended to produce two conspicuity
maps, one aggregating color and another aggregating intensity
information. Finally, these two maps are blended in a saliency
map. For its definite biological characteristic, Itti’s model has
been widely implemented in various of fields, such as image
compression, object detection, and image segmentation, etc.
Achanta et al. [4] proposed a purely computational model
which computes local multiscale color and luminance feature
contrast to generate saliency map. Guo and Zhang [9]
introduce a model using the phase spectrum of quaternion
Fourier transform (PQFT). Each pixel of the image is
represented by a quaternion that consists of color, intensity
and motion features. Hu et al.[9] used texture as a feature for
visual attention model. The contextual texture was analyzed
using the Gabor wavelet transform.
Texture is a basic cue for primates to recognize objects.
Furthermore, texture is useful to capture attention in images
containing small objects present in a cluttered background.
Natural scenes contain significant variation in texture contrast
This work is supported by National High Technology Research an
Development Program of China under Grant No.2011AA040902 and the
ational Nature Science Foundation of China under Grant No.61075075,
61175108.
978-1-4673-0311-8/12/$31.00 ©2012 IEEE 481