立体视觉匹配算法分类与评估

需积分: 0 102 浏览量更新于2024-07-18 1 收藏 4.11MB PDF 举报

"这篇文章是《国际计算机视觉杂志》(International Journal of Computer Vision)第47期1/2/3部分的一篇综述论文，作者Daniel Scharstein和Richard Szeliski深入探讨了密集两帧立体匹配算法的分类与评估。文章旨在为计算机视觉领域的初学者提供一个快速入门立体视觉匹配的途径，同时也对已有的立体匹配方法进行了分析和实验对比，以评估它们的性能。" 在计算机视觉领域，立体匹配是一项核心研究内容，它涉及通过两个不同视角的图像（通常被称为左视图和右视图）来计算三维场景中的对应像素点。这篇文章首先提出了一种立体匹配方法的分类体系，该体系旨在分析和比较个体立体算法的各种组件和设计决策。这种分类体系包括了算法的关键特征，如成本函数、匹配策略、优化方法以及后处理技术等。作者详细介绍了各种匹配算法的原理，如基于局部特征的匹配、全局优化方法、半全局匹配（SGM）以及基于深度学习的现代方法。对于每种方法，他们讨论了其优缺点，以及在实际应用中可能遇到的问题，例如视差不连续性、光照变化和噪声影响等。此外，Scharstein和Szeliski还进行了一系列实验，使用了多种数据集来评估不同立体匹配算法的性能。这些实验结果为选择合适的匹配算法提供了依据，并揭示了各种因素如何影响算法的精度和速度。为了促进进一步的研究和比较，他们创建了一个共享软件平台和标准数据集，这使得研究人员可以更方便地评估和比较新的立体匹配算法。这篇论文不仅是立体视觉初学者的宝贵教程，也是研究者们深入理解现有方法、改进算法和开发新方法的重要参考。通过提供一个结构化的框架来理解和评估立体匹配算法，它为这个领域的发展奠定了坚实的基础。

A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms 13

Figure 2. Stereo matching using dynamic programming. For each

pair of corresponding scanlines, a minimizing path through the ma-

trix of all pairwise matching costs is selected. Lowercase letters (a–k)

symbolize the intensities along each scanline. Uppercase letters rep-

resent the selected path through the matrix. Matches are indicated

by M, while partially occluded points (which have a ﬁxed cost) are

indicated by L and R, corresponding to points only visible in the

left and right image, respectively. Usually, only a limited disparity

range is considered, which is 0–4 in the ﬁgure (indicated by the non-

shaded squares). Note that this diagram shows an “unskewed” x-d

slice through the DSI.

1998b). These approaches work by computing the

minimum-cost path through the matrix of all pairwise

matching costs between two corresponding scanlines.

Partial occlusion is handled explicitly by assigning a

group of pixels in one image to a single pixel in the

other image. Figure 2 shows one such example.

Problems with dynamic programming stereo include

the selection of the right cost for occluded pixels and

the difﬁculty of enforcing inter-scanline consistency,

although several methods propose ways of address-

ing the latter (Ohta and Kanade, 1985; Belhumeur,

1996; Cox et al., 1996; Bobick and Intille, 1999;

Birchﬁeld and Tomasi, 1998b). Another problem is that

the dynamic programming approach requires enforc-

ing the monotonicity or ordering constraint (Yuille and

Poggio, 1984). This constraint requires that the rela-

tive ordering of pixels on a scanline remain the same

between the two views, which may not be the case in

scenes containing narrow foreground objects.

Cooperative Algorithms. Finally, cooperative algo-

rithms, inspired by computational models of human

stereo vision, were among the earliest methods pro-

posed for disparity computation (Dev, 1974; Marr

and Poggio, 1976; Marroquin, 1983; Szeliski and

Hinton, 1985). Such algorithms iteratively perform

local computations, but use nonlinear operations that

result in an overall behavior similar to global optimiza-

tion algorithms. In fact, for some of these algorithms,

it is possible to explicitly state a global function that

is being minimized (Scharstein and Szeliski, 1998).

Recently, a promising variant of Marr and Poggio’s

original cooperative algorithm has been developed

(Zitnick and Kanade, 2000).

3.4. Reﬁnement of Disparities

Most stereo correspondence algorithms compute a set

of disparity estimates in some discretized space, e.g.,

for integer disparities (exceptions include continuous

optimization techniques such as optic ﬂow (Bergen

et al., 1992) or splines (Szeliski and Coughlan, 1997)).

For applications such as robot navigation or people

tracking, these may be perfectly adequate. However

for image-based rendering, such quantized maps lead

to very unappealing view synthesis results (the scene

appears to be made up of many thin shearing layers).

To remedy this situation, many algorithms apply a sub-

pixel reﬁnement stage after the initial discrete corre-

spondence stage. (An alternative is to simply start with

more discrete disparity levels.)

Sub-pixel disparity estimates can be computed in a

variety of ways, including iterative gradient descent

and ﬁtting a curve to the matching costs at discrete

disparity levels (Ryan et al., 1980; Lucas and Kanade,

1981; Tian and Huhns, 1986; Matthies et al., 1989;

Kanade and Okutomi, 1994). This provides an easy

way to increase the resolution of a stereo algorithm with

little additional computation. However, to work well,

the intensities being matched must vary smoothly, and

the regions over which these estimates are computed

must be on the same (correct) surface.

Recently, some questions have been raised about

the advisability of ﬁtting correlation curves to integer-

sampled matching costs (Shimizu and Okutomi,

2001). This situation may even be worse when

sampling-insensitive dissimilarity measures are used

(Birchﬁeld and Tomasi, 1998a). We investigate this

issue in Section 6.4 below.

Besides sub-pixel computations, there are of course

other ways of post-processing the computed dispar-

ities. Occluded areas can be detected using cross-

checking (comparing left-to-right and right-to-left dis-

parity maps) (Cochran and Medioni, 1992; Fua, 1993).

A median ﬁlter can be applied to “clean up” spurious

mismatches, and holes due to occlusion can be ﬁlled by

surface ﬁtting or by distributing neighboring disparity

大多数立体匹配算法计算视差时都是离散的，视图合成效果不好

用亚像素视差

估计来改善这个

问题

亚像素视差估计

计算视差后处理

的其他方法

14 Scharstein and Szeliski

estimates (Birchﬁeld and Tomasi, 1998b; Scharstein,

1999). In our implementation we are not performing

such clean-up steps since we want to measure the per-

formance of the raw algorithm components.

3.5. Other Methods

Not all dense two-frame stereo correspondence algo-

rithms can be described in terms of our basic taxonomy

and representations. Here we brieﬂy mention some ad-

ditional algorithms and representations that are not cov-

ered by our framework.

The algorithms described in this paper ﬁrst enumer-

ate all possible matches at all possible disparities, then

select the best set of matches in some way. This is a use-

ful approach when a large amount of ambiguity may ex-

ist in the computed disparities. An alternative approach

is to use methods inspired by classic (inﬁnitesimal) op-

tic ﬂow computation. Here, images are successively

warped and motion estimates incrementally updated

until a satisfactory registration is achieved. These tech-

niques are most often implemented within a coarse-to-

ﬁne hierarchical reﬁnement framework (Quam, 1984;

Bergen et al., 1992; Barron et al., 1994; Szeliski and

Coughlan, 1997).

A univalued representation of the disparity map is

also not essential. Multi-valued representations, which

can represent several depth values along each line of

sight, have been extensively studied recently, especially

for large multiview data set. Many of these techniques

use a voxel-based representation to encode the recon-

structed colors and spatial occupancies or opacities

(Szeliski and Golland, 1999; Seitz and Dyer, 1999;

Kutulakos and Seitz, 2000; De Bonet and Viola, 1999;

Culbertson et al., 1999; Broadhurst et al., 2001). An-

other way to represent a scene with more complexity

is to use multiple layers, each of which can be repre-

sented by a plane plus residual parallax (Baker et al.,

1998; Birchﬁeld and Tomasi, 1999; Tao et al., 2001).

Finally, deformable surfaces of various kinds have also

been used to perform 3D shape reconstruction from

multiple images (Terzopoulos and Fleischer, 1988;

Terzopoulos and Metaxas, 1991; Fua and Leclerc,

1995; Faugeras and Keriven, 1998).

3.6. Summary of Methods

Table 1 gives a summary of some representative

stereo matching algorithms and their corresponding

taxonomy, i.e., the matching cost, aggregation, and

optimization techniques used by each. The methods

are grouped to contrast different matching costs (top),

aggregation methods (middle), and optimization tech-

niques (third section), while the last section lists some

papers outside the framework. As can be seen from this

table, quite a large subset of the possible algorithm de-

sign space has been explored over the years, albeit not

very systematically.

4. Implementation

We have developed a stand-alone, portable C++ im-

plementation of several stereo algorithms. The imple-

mentation is closely tied to the taxonomy presented

in Section 3 and currently includes window-based al-

gorithms, diffusion algorithms, as well as global opti-

mization methods using dynamic programming, simu-

lated annealing, and graph cuts. While many published

methods include special features and post-processing

steps to improve the results, we have chosen to imple-

ment the basic versions of such algorithms, in order to

assess their respective merits most directly.

The implementation is modular and can easily be

extended to include other algorithms or their compo-

nents. We plan to add several other algorithms in the

near future, and we hope that other authors will con-

tribute their methods to our framework as well. Once a

new algorithm has been integrated, it can easily be com-

pared with other algorithms using our evaluation mod-

ule, which can measure disparity error and reprojection

error (Section 5.1). The implementation contains a so-

phisticated mechanism for specifying parameter values

that supports recursive script ﬁles for exhaustive per-

formance comparisons on multiple data sets.

We provide a high-level description of our code using

the same division into four parts as in our taxonomy.

Within our code, these four sections are (optionally)

executed in sequence, and the performance/quality

evaluator is then invoked. A list of the most important

algorithm parameters is given in Table 2.

4.1. Matching Cost Computation

The simplest possible matching cost is the squared or

absolute difference in color/intensity between corre-

sponding pixels (match

fn). To approximate the effect

of a robust matching score (Black and Rangarajan,

1996; Scharstein and Szeliski, 1998), we truncate

the matching score to a maximal value match

max.

When color images are being compared, we sum the

剩余35页未读，继续阅读

交大雨声

粉丝: 0
资源: 10

立体视觉匹配算法分类与评估

棉花收获机器人视觉系统：颜色分析与立体视觉研究

深度学习在双目立体视觉问题中的应用研究

立体视觉结构恢复：Dhond 1989年综述

共轴立体视觉深度测量

基于嵌入式Linux操作系统的双目立体视觉系统设计.pdf

Marr的计算视觉经典文献-1997

一种立体计算机视觉技术的仿真研究.pdf

2021微立体创意工作总结汇报PPT模板.pptx

SLAM相关文献

Barnard经典文献-1982

最新资源