Stereo Matching with Color-Weighted Correlation, Hierarchical Belief
Propagation and Occlusion Handling
Q`ıngxi´ong Y´ang Liang Wang Ruigang Yang Henrik Stew´enius David Nist´er
Center for Visualization and Virtual Environments
Department of Computer Science, University of Kentucky
http://www.vis.uky.edu/
∼
liiton/
Abstract
In this paper, we formulate an algorithm for the stereo
matching problem with careful handling of disparity, dis-
continuity and occlusion. The algorithm works with
a global matching stereo model based on an energy-
minimization framework. The global energy contains two
terms, the data term and the smoothness term. The data
term is first approximated by a color-weighted correlation,
then refined in occluded and low-texture areas in a repeated
application of a hierarchical loopy belief propagation algo-
rithm. The experimental results are evaluated on the Mid-
dlebury data set, showing that our algorithm is the top per-
former.
1. Introduction
Stereo is one of the most extensively researched topics in
computer vision. Stereo research has recently experienced
somewhat of a new era, as a result of publically available
performance testing such as the Middlebury data set [
11],
which has allowed researchers to compare their algorithms
against all the state-of-the-art algorithms.
In this paper, we describe our stereo algorithm, which
is currently evaluating as the top performer on the Mid-
dlebury data set. The algorithm springs from the popular
energy minimization framework that is the basis for most
of the algorithms on the Middlebury top-list, such as graph
cuts [
4, 10] and belief propagation [12, 13]. In this frame-
work, there is typically a data term and a smoothness term,
where the data term consists of the matching error implied
by the extracted disparity map, and the smoothness term
encodes the prior assumption that world surfaces are piece-
wise smooth.
However, the algorithm presented in this paper departs
somewhat from the normal framework, in that in the final
stages of the algorithm, the data term is updated based on
the current understanding of which pixels in the reference
image are occluded or unstable due to low texture.
The paper is organized as follows: Section
2 gives a
high-level overview of the approach. In Section
3 we then
give the detailed equations for all the building blocks. Sec-
tion 4 reports results supporting the claims that the algo-
rithm is currently the strongest available on the Middlebury
data set. Section
5 concludes.
2. Overview of the Approach
The algorithm can be partitioned into three blocks, ini-
tial stereo (Figure 1), pixel classification (Figure 2) and iter-
ative refinement (Figure
3). In the initial stereo, see Figure
1, the correlation volume is first computed. A basic way
to construct the correlation volume is to compute the abso-
lute difference of luminances of the corresponding pixels in
the left and right images, but there are many other meth-
ods for correlation volume construction. For instance, Sun
et al. [
12] use Birchfield and Tomasi’s pixel dissimilarity
[1] to construct the correlation volume, and Felzenszwalb
[
6] suggests to smooth the image first before calculating the
pixel difference. In this work, we are using color-weighted
correlation to build the correlation volume, in a similar man-
ner as was recently described by Yoon and Kweon [
17]. The
color-weighting makes the match scores less sensitive to oc-
clusion boundaries by using the fact that occlusion bound-
aries most often cause color discontinuities as well. The
initial stereo is run in turn with both the left and the right
image as reference images. This is done just to support
a subsequent mutual consistency check (often called left-
right check) that takes place in the pixel classification block.
Functions E
L
S
and E
R
S
defining the smoothness costs in the
left and right reference images, respectively, are determined
based on the color gradients in the input images. The left
and right smoothness costs and the left and right correlation
costs are then optimized using two separate hierarchical be-
lief propagation processes. The hierarchical belief propaga-
tion is performed in a manner similar to Felzenszwalb [
6],
resulting in the initial left and right disparity maps D
(0)
L
and
1