A Non-Local Cost Aggregation Method for Stereo Matching
Qingxiong Yang
City University of Hong Kong
http://www.cs.cityu.edu.hk/
˜
qiyang/
Abstract
Matching cost aggregation is one of the oldest and still
popular methods for stereo correspondence. While effec-
tive and efficient, cost aggregation methods typically ag-
gregate the matching cost by summing/averaging over a
user-specified, local support region. This is obviously on-
ly locally-optimal, and the computational complexity of the
full-kernel implementation usually depends on the region
size. In this paper, the cost aggregation problem is re-
examined and a non-local solution is proposed. The match-
ing cost values are aggregated adaptively based on pixel
similarity on a tree structure derived from the stereo im-
age pair to preserve depth edges. The nodes of this tree
are all the image pixels, and the edges are all the edges
between the nearest neighboring pixels. The similarity be-
tween any two pixels is decided by their shortest distance
on the tree. The proposed method is non-local as every n-
ode receives supports from all other nodes on the tree. As
can be expected, the proposed non-local solution outper-
forms all local cost aggregation methods on the standard
(Middlebury) benchmark. Besides, it has great advantage
in extremely low computationalcomplexity: only a total of 2
addition/subtraction operations and 3 multiplication oper-
ations are required for each pixel at each disparity level. It
is very close to the complexity of unnormalized box filtering
using integral image which requires 6 addition/subtraction
operations. Unnormalized box filter is the fastest local cost
aggregation method but blurs across depth edges. The pro-
posed method was tested on a MacBook Air laptop comput-
er with a 1.8 GHz Intel Core i7 CPU and 4 GB memory. The
average runtime on the Middlebury data sets is about 90
milliseconds, and is only about 1.25× slower than unnor-
malized box filter. A non-local disparity refinement method
is also proposed based on the non-local cost aggregation
method.
1. Introduction
Stereo correspondence has traditionally been, and con-
tinues to be, one of the most extensively researched topics
in computer vision. Stereo algorithms generally perform
(subsets of) the following four steps:
1. matching cost computation;
2. cost (support) aggregation;
3. disparity computation/optimization; and
4. disparity refinement.
Scharstein and Szeliski [21] developed a taxonomy and
categorization scheme for stereo algorithms, and separat-
ed different stereo algorithms into two broad classes: local
and global algorithms. In a local algorithm, the disparity
computation at a given image pixel depends only on image
intensity/color values within a window. All local algorithms
require cost aggregation (step 2), and usually make implic-
it smoothness assumptions by aggregating support. Global
algorithms, on the other hand, make explicit smoothness as-
sumptions and then solve an optimization problem. Such al-
gorithms typically omit the cost aggregation step, but rather
seek a disparity solution (step 3) that minimizes a global
cost function. Popular global methods include dynamicpro-
gramming [2, 23], belief propagation [13, 14, 15, 16] and
graph cuts [3]. Unlike local algorithms, a global algorithm
estimates the disparity at one pixel using the disparity esti-
mates at all the other pixels.
Cost aggregationmethods are traditionally performed lo-
cally by summing/averaging matching cost over windows
with constant disparity. The most efficient local cost aggre-
gation method is unnormalized box filtering which runs in
linear time (relative to the number of image pixels) using in-
tegral image [24] (also known as a summed area table [8])
but blurs across depth edges. Yoon and Kweon [6] demon-
strated that edge-aware filters like bilateral filter [22] are
very effective for preserving depth edges and Yang et al.
[5] used bilateral filter for depth superresolution. However,
full-kernel implementation of the bilateral filter is slow.
A number of approximation methods have been devel-
oped to accelerate the bilateral filter, including Paris and
Durand’s fast bilateral filter [12], Porikli’s O(1) bilateral fil-
ter [17] and Yang’s real-time bilateral filters [25, 26]. These
methods rely on quantization, and will degrade the perfor-
mance as demonstrated in [18]. Paris and Durand’s method
1