卷积神经网络在立体深度计算中的应用

需积分: 10 32 浏览量更新于2024-09-11 收藏 3.37MB PDF 举报

"这篇论文提出了一种利用卷积神经网络（CNN）进行立体深度计算的方法，通过训练神经网络预测图像对中的匹配度，并结合交叉基成本聚合和半全局匹配技术来优化成本，最后通过左右一致性检查消除遮挡区域的错误。这种方法在KITTI立体数据集上实现了2.61%的误差率，是2014年8月时该数据集上的最佳表现方法。" 基于卷积神经网络的立体深度计算是一种先进的计算机视觉技术，主要用于获取图像对之间的深度信息。在这个过程中，两个从不同水平位置拍摄的图像（通常称为左图像和右图像）被用来计算每个左图像像素的视差。视差是指同一物体在左右两图像中的水平位置差异，它是计算深度的关键。卷积神经网络（CNN）在这项任务中的作用是学习预测两个图像块（patches）之间的匹配程度。CNN的训练目标是学习捕捉图像特征，使得它能够识别出两个对应点在视差空间中的最佳匹配。这种匹配度预测可以被视为一种成本函数，用于衡量不同视差值的合理性。一旦通过CNN得到初步的匹配成本，接下来会应用交叉基成本聚合和半全局匹配算法进行成本优化。交叉基成本聚合考虑了局部邻域的信息，以更准确地估计匹配成本。半全局匹配则采用全局优化策略，考虑了整个图像的匹配一致性，从而提高匹配的准确性。然后，为了消除由于遮挡导致的匹配错误，执行左右一致性检查。这个步骤会比较左右两图像的视差图，如果在右图像中一个点的匹配在左图像中不能回映射，或者回映射的点有不一致的视差，那么这个匹配就被认为是错误的并进行修正。通过这些步骤，提出的立体深度计算方法在实际应用中表现出色，如在KITTI立体数据集上，该方法达到了2.61%的平均像素误差率，这是当时的一个显著成就，展示了CNN在深度感知领域的强大潜力。这种方法不仅对于自动驾驶、机器人导航、3D重建等领域具有重要意义，而且也为后来的深度学习研究提供了重要的参考，推动了计算机视觉领域的发展。使用CNN进行深度计算的优势在于其自动化特征提取能力和强大的学习能力，使得模型能够适应复杂的图像环境，提高深度估计的精度和鲁棒性。

展开

Computing the Stereo Matching Cost with a Convolutional Neural Network

Jure

Zbontar

University of Ljubljana

jure.zbontar@fri.uni-lj.si

Yann LeCun

New York University

yann@cs.nyu.edu

Abstract

We present a method for extracting depth information

from a rectiﬁed image pair. We train a convolutional neu-

ral network to predict how well two image patches match

and use it to compute the stereo matching cost. The cost

is reﬁned by cross-based cost aggregation and semiglobal

matching, followed by a left-right consistency check to elim-

inate errors in the occluded regions. Our stereo method

achieves an error rate of 2.61 % on the KITTI stereo dataset

and is currently (August 2014) the top performing method

on this dataset.

1. Introduction

Consider the following problem: given two images taken

from cameras at different horizontal positions, the goal is

to compute the disparity d for each pixel in the left image.

Disparity refers to the difference in horizontal location of

an object in the left and right image—an object at position

(x, y) in the left image will appear at position (x − d, y) in

the right image. Knowing the disparity d of an object, we

can compute its depth z (i.e. the distance from the object to

the camera) by using the following relation:

z =

, (1)

where f is the focal length of the camera and B is the dis-

tance between the camera centers.

The described problem is a subproblem of stereo recon-

struction, where the goal is to extract 3D shape from one

or more images. According to the taxonomy of Scharstein

and Szeliski [14], a typical stereo algorithm consists of four

steps: (1) matching cost computation, (2) cost aggregation,

(3) optimization, and (4) disparity reﬁnement. Following

Hirschmuller and Scharstein [5], we refer to steps (1) and

(2) as computing the matching cost and steps (3) and (4) as

the stereo method.

We propose training a convolutional neural network [9]

on pairs of small image patches where the true disparity is

known (e.g. obtained by LIDAR). The output of the net-

work is used to initialize the matching cost between a pair

of patches. Matching costs are combined between neighbor-

ing pixels with similar image intensities using cross-based

cost aggregation. Smoothness constraints are enforced by

semiglobal matching and a left-right consistency check is

used to detect and eliminate errors in occluded regions. We

perform subpixel enhancement and apply a median ﬁlter

and a bilateral ﬁlter to obtain the ﬁnal disparity map. Fig-

ure 1 depicts the inputs to and the output from our method.

The two contributions of this paper are:

• We describe how a convolutional neural network can

be used to compute the stereo matching cost.

• We achieve an error rate of 2.61 % on the KITTI

stereo dataset, improving on the previous best result

of 2.83 %.

2. Related work

Before the introduction of large stereo datasets [2, 13],

relatively few stereo algorithms used ground-truth informa-

tion to learn parameters of their models; in this section, we

review the ones that did. For a general overview of stereo

algorithms see [14].

Kong and Tao [6] used sum of squared distances to com-

pute an initial matching cost. They trained a model to pre-

dict the probability distribution over three classes: the ini-

tial disparity is correct, the initial disparity is incorrect due

to fattening of a foreground object, and the initial disparity

is incorrect due to other reasons. The predicted probabil-

ities were used to adjust the initial matching cost. Kong

and Tao [7] later extend their work by combining predic-

tions obtained by computing normalized cross-correlation

over different window sizes and centers. Peris et al. [12]

initialized the matching cost with AD-Census [11] and used

multiclass linear discriminant analysis to learn a mapping

from the computed matching cost to the ﬁnal disparity.

Ground-truth data was also used to learn parameters of

graphical models. Zhang and Seitz [22] used an alterna-

tive optimization algorithm to estimate optimal values of

Markov random ﬁeld hyperparameters. Scharstein and Pal

下载后可阅读完整内容，剩余7页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

静默虚空

粉丝: 51

卷积神经网络在立体深度计算中的应用

基于卷积神经网络的立体匹配算法.pdf

基于卷积神经网络的立体图像质量评价.pdf

网络游戏-基于深度卷积神经网络与深度信息的全景图像融合方法.zip

一种基于卷积神经网络的立体匹配算法设计.pdf

网络游戏-基于卷积神经网络的立体图像舒适度评价方法.zip

基于卷积神经网络的单目深度估计.pdf

基于卷积神经网络的半全局立体匹配.pdf

基于卷积神经网络的高效精准立体匹配算法.pdf

网络游戏-基于卷积神经网络的双目立体匹配VLSI架构设计.zip

一种基于U型全卷积神经网络的深度估计模型.pdf

最新资源