没有合适的资源?快使用搜索试试~ 我知道了~
首页图割的立体匹配(基础入门)
资源详情
资源评论
资源推荐

21
Stereo Matching and Graph Cuts
Ayman Zureiki
1,2
, Michel Devy
1,2
and Raja Chatila
1,2
1
CNRS; LAAS; 7 avenue du Colonel Roche, F-31077 Toulouse,
2
Université de Toulouse; UPS, INSA, INP, ISAE; LAAS-CNRS : F-31077 Toulouse,
France
1. Introduction
The stereo matching aims to find corresponding entities between two (or more) images, i.e.
entities that are projections of the same 3D object in the scene. Constraints used in stereo
matching can be classified into two categories: local constraints, which rely only on a pixel
and on some pixels in its surrounding, and global constraints, which must be verified by the
whole pixels of a line or of the image. The local methods aim to find a matching for a given
pixel without taking into account neighbour pixels correspondences. Global methods try to
define a global model of the observed scene and to minimize a global cost function. They try
to find the correspondences once for all pixels in one line or for all pixels in the image.
Graph cuts techniques have been recently used to solve the stereo matching problem
involving global constraints. These methods transform the matching problem to a
minimization of a global energy function. The minimization is achieved by finding out an
optimal cut (of minimum cost) in a special graph. Different methods were proposed to
construct the graph. However, when applied to minimize a global cost function in stereo
vision, all of them consider for each pixel, all possible disparities between minimum and
maximum values. Our contribution is a new method for constructing a reduced graph: only
some potential values in the disparity range are selected for each pixel. These values can be
provided by a preliminary matching process using only local constraints. We will detail how
this method allows to make wider the disparity range, and at the same time to limit the
volume of the graph, and therefore to reduce the computation time.
In this chapter, the stereo matching problem is defined. A brief state of the art about stereo
matching is presented. The stereo matching can be solved by either local methods or global
ones. Among the global methods, we give a short introduction of dynamic programming
techniques to be a logic introduction of the Graph Cuts methods. We recall the definition of
Graph, flow, cut, and the different algorithms to solve the problem of maximum flow. A
general formalism of Relabelling problem is used to express the stereo matching as a
minimization of an energy function. The implementation of both complete graph (Roy &
Cox, 1998) and reduced graph (our contribution) are detailed. The two methods are
compared from experimental results.
2. Stereo matching
The stereo vision is a tool to find 3D information on a scene perceived by two images (or
more) acquired at the same moment from different points of view. The stereo
Open Access Database www.i-techonline.com
Source: Stereo Vision, Book edited by: Dr. Asim Bhatti,
ISBN 978-953-7619-22-0, pp. 372, November 2008, I-Tech, Vienna, Austria
www.intechopen.com

Stereo Vision
350
reconstruction is based on the aptitude to find in each image the projection of the same
object in the scene. In fact, depth information of an object is related from one side to the
disparity (difference of projections of the object in both images), and on the other side, to the
relative position of both cameras (baseline) and to the image resolution focal length, etc.).
Thus, the stereoscopic reconstruction raises two different problems. The first is the disparity
calculation, which is attached to the problem of stereo correspondence (matching). The
second problem is the ability to inverse the projective geometry problem. In other words,
the tridimensional reconstruction, or how to exploit disparity knowledge and relative
position of the two sensors to find 3D information. The works of (Faugeras, 1993), on the
projective geometry have established a solid basis for the 3D reconstruction problem. For
the matching problem, there is no method sufficiently reliable, robust and effective that
allows a simple use of stereo vision as a sensor of depth measurement.
Binocular stereo vision uses two images acquired by two cameras. A preliminary phase of
calibration is needed to estimate the different parameters of a stereo rig: the parameters of
the projection model for each camera (pinhole geometric model) and the spatial relationship
between the two cameras. This knowledge allows us to calculate the 3D coordinates of a
point from its projections in the two images by a simple triangulation.
Stereo matching is one of the most studied topics in computer vision since more than half a
century (Julesz, 1962). Stereo matching aims at finding in the left and right images, features,
which are the projections of the same entity in the 3D scene. In general, the matching
problem can be seen as a minimization problem. Local approaches try to minimize
separately many energy functions, representing local matching costs supposed independent
between different entities to be matched: a local cost depends on similarity constraints
between these entities. Global approaches try to minimize a unique energy function taking
into account all matching costs: this global cost integrates local matching costs, and also
compatibility costs expressing how consistent are matchings computed on a line or on the
whole image. A detailed taxonomy of stereo correspondence algorithms is proposed in
(Scharstein & Szeliski, 2002). The authors classify stereo matching methods with respect to
four criteria: (i) The local matching cost, (ii) The aggregation area while computing the local
cost, (iii) The optimization method, (iv) The method performed to refine matching results.
The method developed at LAAS since 1995 (Grandjean & Lasserre, 1995) is a modified
version of the dense or pixel-based algorithm described by (Faugeras et al., 1993). This
method is suitable for robotic applications, especially because it satisfies real-time
constraints. It can be classified under local methods, because there is no a global
optimization step to make consistent matchings of adjacent pixels. Here we propose to
apply such a global optimization method as a second step of our matching algorithm, to
take advantage of the global minimization while remaining within real-time constraints as
much as possible. We consider that left and right images are already rectified: it allows to
limit the research area in the right image, for the match of a pixel in the left image.
2.1 Local stereo matching methods
Local stereo matching methods search separately for the best match of each pixel strating
from one image (e.g. the left one) without taking into account the matches of other ones. The
matching cost between two pixels is based on similarity measurements of the local intensity
function. Intuitively, the projections of the same 3D point will have (naturally) similar
www.intechopen.com

Stereo Matching and Graph Cuts
351
intensities in the two images. In fact, the Lambertian model (Horn, 1986) assumes that the
object surface reflects uniformly the light in all directions. Using this model, we can suppose
that the corresponding pixels in both images are similar, and indeed, their neighbours are
also similar, assuming that view fields between the two cameras are very close (no
occlusions). A correlation measurement can calculate a degree of similarity between two
point sets. Local methods try to find a match p2 in the right image for a point p1 in the left
one. The correlation measurement uses information given by p1, p2 and their neighbour
pixels. The pixel p1 and its neighbours form a first point set, and the point p2 and its
neighbours constitute the other point set. A correlation score evaluates the similarity
between these data sets.
Many local approaches use comparison windows centred on the considered pixel. Among
the most known measurements, we can find: Sum of Squared Differences (SSD) (Cox et al.,
1996), Sum of Absolute Differences (SAD) (Hirschmüller, 2001), Zero-mean Normalized
Cross-Correlation (ZNCC) (Chen & Medioni, 1999), (Sára, 2002), etc.
Local stereo correspondence methods are in general fast algorithms, so can be used for real-
time applications. However, they are exposed to many failure sources, in particular
occlusions or variations of intensity between the two images. In fact, these situations can
produce many false matches. In addition, because of the absence of any constraint between
matches, adjacent pixels can have very different disparities, which can be particularly
remarkable in scenes having vertical lines (edges of an open door for example). Global
methods try to overcome these problems.
2.2 Global stereo matching methods
Global approaches try to define a global model of observed scene and to minimize a global
cost function. Matches for pixels of one line or or pixels of the whole image, are searched at
the same time. In a global method, the matching between a pixel in the left image and a
pixel in the right image does not depend only on their neighbours, but also on the matches
of their neighbours. Hence, the match of a pixel influences the matches of its neighbour
pixels. This influence is modelled by regularization constraints on the matches set. Some
methods are based only on the epipolar constraint to transform the bidimensional matching
problem into one-dimensional problem, as in dynamic programming (Belhumeur, 1996),
(Cox, 1992). Other methods address the bidimensionnal problem by taking into account,
inter-lines constraints, i.e. compatibility constraints between matchings provided on every
epipolar lines, as in graph cuts algorithms (Boykov et al., 1998), (Ishikawa & Geiger, 1998).
The global regularization aims to reduce the sensibility of stereo correspondence to
ambiguities caused by occlusions, poor local texture or fluctuation of illumination. This
improvement has a cost, which is the increasing of algorithms complexity, and in
consequence, a longer execution time, in addition to some secondary effects due to this
regularization (smoothing). We detail two global methods: dynamic programming and
graph cuts.
Dynamic Programming:
Dynamic programming, introduced by Richard Bellman (Bellman, 1957), allows resolving
optimization problems having an objective function as a sum of monotone non-decreasing
functions of resources. In practice, this means that we can infer the optimal solution of a
problem using optimal solutions of sub-problems. The dynamic programming applied to
www.intechopen.com

Stereo Vision
352
stereo matching searches for a path of minimal cost through a matrix composed of possible
matches. To reduce the complexity, this technique is applied on two sets of points of the
same epipolar line. Thus, the stereo correspondence is applied successively to find
matchings for all pixels of a line of one image with pixels located on its epipolar line in the
other image (Ohta & Kanade, 1985).
To obtain a global path cost equal to the sum of the partial-paths costs, it is mandatory to
use additive costs. We define the local cost for each point in the research zone as the cost of a
local stereo matching (SAD, SSD, etc.). Occlusions can be taken into account, making
possible to link a set of image pixels with the same pixel in the other image; penalties are
considered for these relations (occlusion costs), which will be added to the global cost of any
path in the matrix (Ohta & Kanade, 1985), (Cox et al., 1996). This formulation presents many
inconvenient as the sensibility to the occlusion cost, the difficulty to guarantee inter-lines
consistency (Bobick & Intille, 1999), and the weak application of constraints on order and
continuity, that could be not satisfied.
The dynamic programming can help in finding matchings in poorly textured zones, and in
solving some occlusion problems. But this method brings also some weak points, as
complexity of calculation, possibility of propagation of a local error through all the research
line, and non-consistency of disparity between lines.
Graph Cuts:
The majority of stereo matching dynamic programming methods try to match pixels
between epipolar lines in both images without taking into account inter-lines consistency.
Hence, they do not use the bidimensionnal nature of the problem. To overcome this
drawback, and to take into consideration bidimensionnal continuity constraint, a solution
has been proposed using the graph theory. Initially, graph cuts applied to stereo matching
were proposed by (Roy & Cox, 1998), and then reformulated by (Veksler, 1999) in which the
matching problem is considered as a minimization of an energy function. Afterwards, the
iterative graph-cuts algorithms were introduced in (Kolmogorov & Zabih, 2001),
(Kolmogorov & Zabih, 2002a), (Kolmogorov & Zabih, 2002b).
The first global method based on graph cuts for stereo correspondence (Roy & Cox, 1998),
(Roy, 1999), started from the 1D formulation of the order constraint used by the dynamic
programming applied separately to each image line. They tried to find a more general 2D
formulation for this constraint to be applied to all lines together. They proposed a local
coherence constraint, which suggests that the disparity function is locally smooth, which
means that neighbour pixels in all directions have similar disparities. They applied this local
coherence constraint by defining a disparity matching cost, which depends on intensity
variations of matched pixels. In the case of two cameras, the matching cost is the squared
difference of intensities.
The next step in the method proposed by Roy and Cox is to estimate the optimal disparity
map over the entire image. The matching constraints are expressed in a 3D mesh composed
of planes, themselves composed of an image of nodes. There is a plane for each level of
disparity, and each node represents a matching between two pixels in original images. The
3D mesh is then transformed into a graph of maximal flow by connecting each node to its
four neighbours in the same plane by edges called occlusion edge, and with the two nodes
in the neighbour planes with edges called disparity edges. Edges are not oriented. We add
two special nodes: a source connected to all nodes in the plane of minimum disparity, and a
www.intechopen.com

Stereo Matching and Graph Cuts
353
sink connected to all nodes in the plane of maximum disparity. The weight of a disparity
edge is equal to the mean value of matching costs of the two nodes. For occlusion edges, the
weight is multiplied by a constant to control the smoothness of the optimal disparity map. A
graph cut will separate the nodes in two sub-sets: the optimal disparity map is constructed
by the assignment of each pixel with the bigger value of disparity for which the
corresponding node is still connected to the source.
(Ishikawa & Geiger, 1998) pointed out that the method of Roy and Cox can deal only with
convex maps. Thereby, it can only take into account linear penalties on disparity, which may
lead to not very good results due to an over smoothing of disparities. They proposed a novel
graph with oriented edges, and then it is possible to reinforce the constraint of uniqueness
and order. However, their method is also weak at discontinuities because of linear penalties.
Sub-optimal algorithms: (Boykov et al., 1998; Boykov et al., 1999) proposed another method
to resolve the stereo corresponding using graph cuts. The authors showed that the problem
of stereo corresponding could be formulated by a Markov Random Field (MRF). They showed
that the estimate Maximum A Priori (MAP) of such MRF, could be obtained by a minimal
multi-way cut, using a maximum flow. The advantage of such a method is that it accepts
non linear penalties for discontinuities, and then it gives more precise disparity maps
especially near objects' edges. As the general problem of minimal multi-way cut is NP-
complete (Dahlhaus et al., 1994), Boylov et al. decided to introduce an approximated
algorithm, which can resolve iteratively some sub-problems until convergence.
(Kolmogorov & Zabih, 2001) have continued the works of Boykov trying to ameliorate the
energy function to represent more explicitly the occlusion. The sub-optimal approach has a
wider application spectrum than the one proposed by Roy and Cox or by Ishikawa and
Geiger. However, it is iterative and sub-optimal, and then its convergence speed and the
quality of the obtained minimum must be supervised.
In spite of their good results, methods based on graph cuts are disadvantaged by several
limitations. Firstly, because of using the bidimensionnal continuity constraint, these
methods can cause an excess of regularization (over smoothing), which can appear as the
effects of a local filter. Secondly, as the penalty function (cost of attributing different
disparities to neighbour pixels) is not always convex, this returns the minimization of
energy function to be a NP-complete problem (Kolmogorov & Zabih, 2001), and in
consequence, the obtained solution is only an approximation of the exact one.
3. Minimum cut problem
3.1 Graph definition
A graph is a set of vertices connected by edges. A binary relation between vertices, called
adjacency relation, defines the connection. A graph G be defined as a pair (V,E), where V is
a set of vertices, and E is a set of edges between the vertices E = {(u,v) | u, v ∈ V}. If the
graph is undirected (see figure 1), the adjacency relation is symmetric, or E = {{u,v} | u, v ∈
V} (sets of vertices rather than ordered pairs). In a directed graph (cf. figure 2), the edges are
ordered, the edge (u,v) is different from the edge (v,u). When an edge e=(u,v) connects the
vertices u and v, these two vertices are adjacent, and called the extremities of the edge e.
Two edges are called adjacent if they have in common one vertex. A weighted graph is a
graph having a weight, or a number, associated with each edge. Some algorithms require all
weights to be non-negative, integral, positive, etc.
www.intechopen.com
剩余24页未读,继续阅读

















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0