1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2019.2903318, IEEE
Transactions on Image Processing
IEEE TRANSACTIONS ON , VOL. **, NO. *, JANUARY 2019 3
III. OVERVIEW OF THE TWO LAYER
OPTIMIZATION
The matching cost volume C
d
(x) is generated by MC-
CNN
1
[6]. The disparity map d(x) is computed by winner-
take-all:
d(x) = arg min
d
∗
C
d
∗
(x) (1)
The left reference image is segmented into superpixels {s
k
}
by the graph-based segmentation [37]. The workflow of our
method is shown in Fig. 1.
We propose a two-layer optimization to refine the WTA
disparity map. In the global optimization layer (Section IV), a
front-parallel disparity map is estimated by MRF optimization.
The 3D neighborhood system N
3d
is derived from superpixels
mean disparities {µ
s
}. In the local optimization layer (Section
V), slanted planes {π
s
} are fitted for superpixels by RANSAC
and mean disparities of superpixels {µ
s
} are utilized to con-
straint the fitting. The initial slanted disparity map is refined
by a probabilistic model that exploits Bayesian inference and
Bayesian prediction in the 3D neighborhood system. Both
optimization layers operate at superpixel level and have high
efficiency.
IV. FRONT-PARALLEL DISPARITY MAP
We use the global MRF optimization to estimate a front-
parallel disparity map. Superpixels are formulated as graph
nodes. MRF optimization aims to minimize the following
energy:
E(µ) =
X
s∈Ω
φ
s
(µ
s
) + λ
X
(s,t)∈N
ψ
st
(µ
s
, µ
t
), (2)
where µ
s
is the label, in our case it is the mean disparity
of superpixel s; Ω is the set of superpixels, Ω = {s
k
}, and
N represent the set of neighboring superpixels; and φ
s
(µ
s
)
is called the data term, ψ
st
(µ
s
, µ
t
) is called the smoothness
term and λ is a parameter to balance the influence of the
smoothness term. In contrast to 3D label MRF, optimizing 1D
label on superpixel level is efficient (Section IV-B).
We propose a novel data term which is based on dispar-
ity distribution (Section IV-A) instead of matching cost or
similarity measure between left and right images. To handle
the foreground-background occlusions, the 3D neighborhood
system which represents depth discontinuities is derived by
{µ
s
} (Section IV-C). We also study a special case and prove
that the 1D label MRF formulation cannot model the highly
slanted surfaces (Section IV-D).
A. Disparity Distribution Interpretation
Segment-based stereo methods assume that disparities are
approximately linear within a segmentation. With the piece-
wise planar surfaces assumption, the disparity distribution
of a planar surface with appropriate boundaries shall be
evenly distributed. Considering the irregular boundary shape
1
Downloaded from https://github.com/t-taniai/LocalExpStereo
of superpixels, we model the disparity distribution within a
superpixel s a normal distribution
Norm
d
(µ
s
, σ
s
) =
1
√
2πσ
s
exp(−
(d − µ
s
)
2
2σ
2
s
), (3)
where d represents the disparity, µ
s
and σ
s
are disparity mean
and variance of superpixel s, respectively. Higher σ
s
indicates
a more slanted surface while for a front-parallel surface, σ
s
is approximately equal to zero. The data term of (2) is based
on disparity distribution histograms, as described in Section
IV-B.
B. MRF Optimization
To estimate a front-parallel disparity map, we estimate
mean disparities of superpixels. The front-parallel plane of
superpixel s can be obtained by π
fp
s
= (0, 0, µ
s
). The data
term and smoothness term of (2) are defined as follows:
1) Data Term: To measure the confidence of disparity
centers, the disparity distributions of superpixels are divided
into histogram bins. We count the number that the WTA
disparity d
s
(x) in superpixel s falls into a bin B(µ
s
) with
bin-width L. The data term of s is defined as
φ
s
(µ
s
) = N
s
−
N
s
X
i=1
I(d
s
(x
i
) ∈ B(µ
s
)), (4)
where N
s
is the number of pixels in superpixel s, µ
s
takes
discrete values, µ
s
= 0, L, 2L, ···, and lower data term
implies higher confidence due to the negative sign. I is a
function of condition, defined as
I(·) =
(
1, if · is true
0, if · is false
, (5)
and in (4) I indicates whether the disparity d
s
(x
i
) falls into
bin B(µ
s
), i.e. d
s
(x
i
) ∈ [µ
s
, µ
s
+ L).
The design of data term is voting-based. More observations
falling in the same bin results in a higher confidence. The
WTA disparities in occluded regions are noise-corrupted and
it is hard for them to reach a consensus. Therefore, the data
term in occluded regions is relatively high and the label is
dominated by the smoothness term.
2) Smoothness Term: The smoothness term enforces the
similarity of disparity distribution centers among neighboring
superpixels, which is defined as
ψ
st
(µ
s
, µ
t
) = max(ω
st
, )L(s, t)T (µ
s
, µ
t
), (6)
where ω
st
is a color-similarity weight which is defined as
ω
st
= e
−kI(s)−I(t)k
2
/γ
, (7)
where γ is a parameter that controls the influence of color
weight, and I(s) denotes the average color of superpixel s;
is a lower-bound truncated value [29]; L(s, t) [38] is the shared
boundary length between neighboring superpixels s and t; and
T could be a metric or a semi-metric which will be defined
in Section IV-C.