没有合适的资源？快使用搜索试试~ 我知道了~

首页图割的立体匹配(基础入门)

资源详情

资源评论

资源推荐

21

Stereo Matching and Graph Cuts

Ayman Zureiki

1,2

, Michel Devy

1,2

and Raja Chatila

1,2

1

CNRS; LAAS; 7 avenue du Colonel Roche, F-31077 Toulouse,

2

Université de Toulouse; UPS, INSA, INP, ISAE; LAAS-CNRS : F-31077 Toulouse,

France

1. Introduction

The stereo matching aims to find corresponding entities between two (or more) images, i.e.

entities that are projections of the same 3D object in the scene. Constraints used in stereo

matching can be classified into two categories: local constraints, which rely only on a pixel

and on some pixels in its surrounding, and global constraints, which must be verified by the

whole pixels of a line or of the image. The local methods aim to find a matching for a given

pixel without taking into account neighbour pixels correspondences. Global methods try to

define a global model of the observed scene and to minimize a global cost function. They try

to find the correspondences once for all pixels in one line or for all pixels in the image.

Graph cuts techniques have been recently used to solve the stereo matching problem

involving global constraints. These methods transform the matching problem to a

minimization of a global energy function. The minimization is achieved by finding out an

optimal cut (of minimum cost) in a special graph. Different methods were proposed to

construct the graph. However, when applied to minimize a global cost function in stereo

vision, all of them consider for each pixel, all possible disparities between minimum and

maximum values. Our contribution is a new method for constructing a reduced graph: only

some potential values in the disparity range are selected for each pixel. These values can be

provided by a preliminary matching process using only local constraints. We will detail how

this method allows to make wider the disparity range, and at the same time to limit the

volume of the graph, and therefore to reduce the computation time.

In this chapter, the stereo matching problem is defined. A brief state of the art about stereo

matching is presented. The stereo matching can be solved by either local methods or global

ones. Among the global methods, we give a short introduction of dynamic programming

techniques to be a logic introduction of the Graph Cuts methods. We recall the definition of

Graph, flow, cut, and the different algorithms to solve the problem of maximum flow. A

general formalism of Relabelling problem is used to express the stereo matching as a

minimization of an energy function. The implementation of both complete graph (Roy &

Cox, 1998) and reduced graph (our contribution) are detailed. The two methods are

compared from experimental results.

2. Stereo matching

The stereo vision is a tool to find 3D information on a scene perceived by two images (or

more) acquired at the same moment from different points of view. The stereo

Open Access Database www.i-techonline.com

Source: Stereo Vision, Book edited by: Dr. Asim Bhatti,

ISBN 978-953-7619-22-0, pp. 372, November 2008, I-Tech, Vienna, Austria

www.intechopen.com

Stereo Vision

350

reconstruction is based on the aptitude to find in each image the projection of the same

object in the scene. In fact, depth information of an object is related from one side to the

disparity (difference of projections of the object in both images), and on the other side, to the

relative position of both cameras (baseline) and to the image resolution focal length, etc.).

Thus, the stereoscopic reconstruction raises two different problems. The first is the disparity

calculation, which is attached to the problem of stereo correspondence (matching). The

second problem is the ability to inverse the projective geometry problem. In other words,

the tridimensional reconstruction, or how to exploit disparity knowledge and relative

position of the two sensors to find 3D information. The works of (Faugeras, 1993), on the

projective geometry have established a solid basis for the 3D reconstruction problem. For

the matching problem, there is no method sufficiently reliable, robust and effective that

allows a simple use of stereo vision as a sensor of depth measurement.

Binocular stereo vision uses two images acquired by two cameras. A preliminary phase of

calibration is needed to estimate the different parameters of a stereo rig: the parameters of

the projection model for each camera (pinhole geometric model) and the spatial relationship

between the two cameras. This knowledge allows us to calculate the 3D coordinates of a

point from its projections in the two images by a simple triangulation.

Stereo matching is one of the most studied topics in computer vision since more than half a

century (Julesz, 1962). Stereo matching aims at finding in the left and right images, features,

which are the projections of the same entity in the 3D scene. In general, the matching

problem can be seen as a minimization problem. Local approaches try to minimize

separately many energy functions, representing local matching costs supposed independent

between different entities to be matched: a local cost depends on similarity constraints

between these entities. Global approaches try to minimize a unique energy function taking

into account all matching costs: this global cost integrates local matching costs, and also

compatibility costs expressing how consistent are matchings computed on a line or on the

whole image. A detailed taxonomy of stereo correspondence algorithms is proposed in

(Scharstein & Szeliski, 2002). The authors classify stereo matching methods with respect to

four criteria: (i) The local matching cost, (ii) The aggregation area while computing the local

cost, (iii) The optimization method, (iv) The method performed to refine matching results.

The method developed at LAAS since 1995 (Grandjean & Lasserre, 1995) is a modified

version of the dense or pixel-based algorithm described by (Faugeras et al., 1993). This

method is suitable for robotic applications, especially because it satisfies real-time

constraints. It can be classified under local methods, because there is no a global

optimization step to make consistent matchings of adjacent pixels. Here we propose to

apply such a global optimization method as a second step of our matching algorithm, to

take advantage of the global minimization while remaining within real-time constraints as

much as possible. We consider that left and right images are already rectified: it allows to

limit the research area in the right image, for the match of a pixel in the left image.

2.1 Local stereo matching methods

Local stereo matching methods search separately for the best match of each pixel strating

from one image (e.g. the left one) without taking into account the matches of other ones. The

matching cost between two pixels is based on similarity measurements of the local intensity

function. Intuitively, the projections of the same 3D point will have (naturally) similar

www.intechopen.com

Stereo Matching and Graph Cuts

351

intensities in the two images. In fact, the Lambertian model (Horn, 1986) assumes that the

object surface reflects uniformly the light in all directions. Using this model, we can suppose

that the corresponding pixels in both images are similar, and indeed, their neighbours are

also similar, assuming that view fields between the two cameras are very close (no

occlusions). A correlation measurement can calculate a degree of similarity between two

point sets. Local methods try to find a match p2 in the right image for a point p1 in the left

one. The correlation measurement uses information given by p1, p2 and their neighbour

pixels. The pixel p1 and its neighbours form a first point set, and the point p2 and its

neighbours constitute the other point set. A correlation score evaluates the similarity

between these data sets.

Many local approaches use comparison windows centred on the considered pixel. Among

the most known measurements, we can find: Sum of Squared Differences (SSD) (Cox et al.,

1996), Sum of Absolute Differences (SAD) (Hirschmüller, 2001), Zero-mean Normalized

Cross-Correlation (ZNCC) (Chen & Medioni, 1999), (Sára, 2002), etc.

Local stereo correspondence methods are in general fast algorithms, so can be used for real-

time applications. However, they are exposed to many failure sources, in particular

occlusions or variations of intensity between the two images. In fact, these situations can

produce many false matches. In addition, because of the absence of any constraint between

matches, adjacent pixels can have very different disparities, which can be particularly

remarkable in scenes having vertical lines (edges of an open door for example). Global

methods try to overcome these problems.

2.2 Global stereo matching methods

Global approaches try to define a global model of observed scene and to minimize a global

cost function. Matches for pixels of one line or or pixels of the whole image, are searched at

the same time. In a global method, the matching between a pixel in the left image and a

pixel in the right image does not depend only on their neighbours, but also on the matches

of their neighbours. Hence, the match of a pixel influences the matches of its neighbour

pixels. This influence is modelled by regularization constraints on the matches set. Some

methods are based only on the epipolar constraint to transform the bidimensional matching

problem into one-dimensional problem, as in dynamic programming (Belhumeur, 1996),

(Cox, 1992). Other methods address the bidimensionnal problem by taking into account,

inter-lines constraints, i.e. compatibility constraints between matchings provided on every

epipolar lines, as in graph cuts algorithms (Boykov et al., 1998), (Ishikawa & Geiger, 1998).

The global regularization aims to reduce the sensibility of stereo correspondence to

ambiguities caused by occlusions, poor local texture or fluctuation of illumination. This

improvement has a cost, which is the increasing of algorithms complexity, and in

consequence, a longer execution time, in addition to some secondary effects due to this

regularization (smoothing). We detail two global methods: dynamic programming and

graph cuts.

Dynamic Programming:

Dynamic programming, introduced by Richard Bellman (Bellman, 1957), allows resolving

optimization problems having an objective function as a sum of monotone non-decreasing

functions of resources. In practice, this means that we can infer the optimal solution of a

problem using optimal solutions of sub-problems. The dynamic programming applied to

www.intechopen.com

Stereo Vision

352

stereo matching searches for a path of minimal cost through a matrix composed of possible

matches. To reduce the complexity, this technique is applied on two sets of points of the

same epipolar line. Thus, the stereo correspondence is applied successively to find

matchings for all pixels of a line of one image with pixels located on its epipolar line in the

other image (Ohta & Kanade, 1985).

To obtain a global path cost equal to the sum of the partial-paths costs, it is mandatory to

use additive costs. We define the local cost for each point in the research zone as the cost of a

local stereo matching (SAD, SSD, etc.). Occlusions can be taken into account, making

possible to link a set of image pixels with the same pixel in the other image; penalties are

considered for these relations (occlusion costs), which will be added to the global cost of any

path in the matrix (Ohta & Kanade, 1985), (Cox et al., 1996). This formulation presents many

inconvenient as the sensibility to the occlusion cost, the difficulty to guarantee inter-lines

consistency (Bobick & Intille, 1999), and the weak application of constraints on order and

continuity, that could be not satisfied.

The dynamic programming can help in finding matchings in poorly textured zones, and in

solving some occlusion problems. But this method brings also some weak points, as

complexity of calculation, possibility of propagation of a local error through all the research

line, and non-consistency of disparity between lines.

Graph Cuts:

The majority of stereo matching dynamic programming methods try to match pixels

between epipolar lines in both images without taking into account inter-lines consistency.

Hence, they do not use the bidimensionnal nature of the problem. To overcome this

drawback, and to take into consideration bidimensionnal continuity constraint, a solution

has been proposed using the graph theory. Initially, graph cuts applied to stereo matching

were proposed by (Roy & Cox, 1998), and then reformulated by (Veksler, 1999) in which the

matching problem is considered as a minimization of an energy function. Afterwards, the

iterative graph-cuts algorithms were introduced in (Kolmogorov & Zabih, 2001),

(Kolmogorov & Zabih, 2002a), (Kolmogorov & Zabih, 2002b).

The first global method based on graph cuts for stereo correspondence (Roy & Cox, 1998),

(Roy, 1999), started from the 1D formulation of the order constraint used by the dynamic

programming applied separately to each image line. They tried to find a more general 2D

formulation for this constraint to be applied to all lines together. They proposed a local

coherence constraint, which suggests that the disparity function is locally smooth, which

means that neighbour pixels in all directions have similar disparities. They applied this local

coherence constraint by defining a disparity matching cost, which depends on intensity

variations of matched pixels. In the case of two cameras, the matching cost is the squared

difference of intensities.

The next step in the method proposed by Roy and Cox is to estimate the optimal disparity

map over the entire image. The matching constraints are expressed in a 3D mesh composed

of planes, themselves composed of an image of nodes. There is a plane for each level of

disparity, and each node represents a matching between two pixels in original images. The

3D mesh is then transformed into a graph of maximal flow by connecting each node to its

four neighbours in the same plane by edges called occlusion edge, and with the two nodes

in the neighbour planes with edges called disparity edges. Edges are not oriented. We add

two special nodes: a source connected to all nodes in the plane of minimum disparity, and a

www.intechopen.com

Stereo Matching and Graph Cuts

353

sink connected to all nodes in the plane of maximum disparity. The weight of a disparity

edge is equal to the mean value of matching costs of the two nodes. For occlusion edges, the

weight is multiplied by a constant to control the smoothness of the optimal disparity map. A

graph cut will separate the nodes in two sub-sets: the optimal disparity map is constructed

by the assignment of each pixel with the bigger value of disparity for which the

corresponding node is still connected to the source.

(Ishikawa & Geiger, 1998) pointed out that the method of Roy and Cox can deal only with

convex maps. Thereby, it can only take into account linear penalties on disparity, which may

lead to not very good results due to an over smoothing of disparities. They proposed a novel

graph with oriented edges, and then it is possible to reinforce the constraint of uniqueness

and order. However, their method is also weak at discontinuities because of linear penalties.

Sub-optimal algorithms: (Boykov et al., 1998; Boykov et al., 1999) proposed another method

to resolve the stereo corresponding using graph cuts. The authors showed that the problem

of stereo corresponding could be formulated by a Markov Random Field (MRF). They showed

that the estimate Maximum A Priori (MAP) of such MRF, could be obtained by a minimal

multi-way cut, using a maximum flow. The advantage of such a method is that it accepts

non linear penalties for discontinuities, and then it gives more precise disparity maps

especially near objects' edges. As the general problem of minimal multi-way cut is NP-

complete (Dahlhaus et al., 1994), Boylov et al. decided to introduce an approximated

algorithm, which can resolve iteratively some sub-problems until convergence.

(Kolmogorov & Zabih, 2001) have continued the works of Boykov trying to ameliorate the

energy function to represent more explicitly the occlusion. The sub-optimal approach has a

wider application spectrum than the one proposed by Roy and Cox or by Ishikawa and

Geiger. However, it is iterative and sub-optimal, and then its convergence speed and the

quality of the obtained minimum must be supervised.

In spite of their good results, methods based on graph cuts are disadvantaged by several

limitations. Firstly, because of using the bidimensionnal continuity constraint, these

methods can cause an excess of regularization (over smoothing), which can appear as the

effects of a local filter. Secondly, as the penalty function (cost of attributing different

disparities to neighbour pixels) is not always convex, this returns the minimization of

energy function to be a NP-complete problem (Kolmogorov & Zabih, 2001), and in

consequence, the obtained solution is only an approximation of the exact one.

3. Minimum cut problem

3.1 Graph definition

A graph is a set of vertices connected by edges. A binary relation between vertices, called

adjacency relation, defines the connection. A graph G be defined as a pair (V,E), where V is

a set of vertices, and E is a set of edges between the vertices E = {(u,v) | u, v ∈ V}. If the

graph is undirected (see figure 1), the adjacency relation is symmetric, or E = {{u,v} | u, v ∈

V} (sets of vertices rather than ordered pairs). In a directed graph (cf. figure 2), the edges are

ordered, the edge (u,v) is different from the edge (v,u). When an edge e=(u,v) connects the

vertices u and v, these two vertices are adjacent, and called the extremities of the edge e.

Two edges are called adjacent if they have in common one vertex. A weighted graph is a

graph having a weight, or a number, associated with each edge. Some algorithms require all

weights to be non-negative, integral, positive, etc.

www.intechopen.com

剩余24页未读，继续阅读

安全验证

文档复制为VIP权益，开通VIP直接复制

信息提交成功

## 评论0