Proceedings
of
the 2012 International Conference on Wavelet Analysis and Pattern Recognition, Xian, 15-17 July, 2012
A SALIENT
HIERARCHICAL
MODEL
FOR
OBJECT
RECOGNITION
WEI-BIN
YANG,
BIN
FANG,
ZHAO-WEI
SHANG, BO
LIN
School
of
Computer Science, Chongqing University, Chongqing, China
E-MAIL:
ywb@cqu.edu.cn
Abstract:
Image saliency attempts to describe the most conspicuous
part
in an input image by mimicking human visual selective
attention mechanism. Naturally, it could be adopted for
improving object recognition. To demonstrate the
effectiveness of saliency in object recognition, this
paper
proposes a salient hierarchical modeL First, the traditional
saliency model is modified for more robust saliency estimation.
Second, the visual saliency detection method is combined with
the Hierarchical Maximization model to provide more useful
visual information for classification. Experimental results
show
that
the improved saliency model extracts more accurate
conspicuity, and the proposed salient hierarchical model
outperforms Hierarchical Maximization modeL
Keywords:
Image saliency; visual cortex; hierarchical model; object
recognition
1. Introduction
To learn how humans look and recognize is an
important issue in computer vision and pattern recognition.
Two main involving research topics are visual saliency
detection and object recognition. Mostly, we develop
related research work in different ways. However, since
both two tasks are inspired by human visual system and
visual cortex, it is reasonable to believe that the research
achievement may benefit each other. Therefore, a robust
saliency model and an effective combination may be the
key for attention based object recognition.
Visual saliency is believed to drive human fixation
behavior during free viewing by attracting visual attention
in a bottom-up way. Moreover, saliency also appears to
determine which details humans find interesting in visual
scenes [1]. The most influential computational framework
for estimating visual saliency is proposed by Itti et al. [2],
which implemented and further developed the
physiologically inspired saliency-based model
of
visual
attention introduced by Koch and Ullman [3]. Itti's saliency
model first computes feature maps for color, intensity and
orientation using a center-surround operator across different
978·1-4673·1535·7/121$31.00 ©2012 Crown
scales, and then generates the saliency map by
normalization and summation on these feature maps.
Achanta et al. [4] used features
of
color and luminance to
detect salient region with well-defmed boundaries.
Goferman et al. [5] presented a saliency detector by
computing the dissimilarity between different image
patches over four scales. Cheng et al. [6] proposed a global
method to detect visual saliency by measuring the
dissimilarity between different image regions, which
obtained excellent performance on salient object detection.
Hou et al. [7] proposed an image descriptor, denoted image
signature, to approximate the foreground
of
an image using
the Discrete Cosine Transform and Inverse Discrete Cosine
Transform.
In addition, based on our knowledge
of
visual vortex,
many studies focus on biologically plausible method for
object class recognition. Recent work by Serre et al. [8]
proposed a computational model (Hierarchical
Maximization,
HMAX)
based on the feedforward path
of
object recognition in cortex that accounts for the first
100-200 milliseconds
of
processing in the ventral stream
of
primate visual cortex [9]. HMAX model obtains promising
results on some
of
the standard classification datasets.
Mutch et al. [10] improved HMAX model by incorporating
some additional biologically-motivated properties, such as
sparsity and localized intermediated-level features.
To prove visual saliency is useful for object
recognition, Riesenhuber et al. [11] applied Itti's saliency
model with SIFT descriptor. Han et al. [12] combined
attention and recognition by replacing the first layer
of
the
HMAX architecture with a saliency network. In this paper,
we attempt to provide a new view in another way. We use
saliency model to guide the learning process and to form
the principle
of
choosing training samples in HMAX
model.
The rest
of
this paper is organized as follows. Section
2 introduces the proposed salient hierarchical model in
detail. Section 3 evaluates the performance
of
the improved
saliency model and the proposed salient hierarchical model.
Conclusions are given in Section 4.
244