Pyramid Stereo Matching Network: A 3D CNN Approach for Accurate Depth Estimation

需积分: 27 94 浏览量更新于2024-09-08 收藏 614KB PDF 举报

Pyramid Stereo Matching Network (PSMNet) 是一种深度学习方法，用于从一对立体图像中估计深度，它是近年来深度估计研究中的重要进展。传统的深度估计任务被看作是卷积神经网络（CNN）可解决的监督学习问题，但传统方法主要依赖于基于 patches 的双胞胎网络结构，这在处理像遮挡、纹理缺失等复杂场景中寻找对应关系时，往往缺乏全局上下文信息的充分利用。 PSMNet的核心创新在于它引入了两个关键模块：空间金字塔池化（Spatial Pyramid Pooling）和三维卷积神经网络（3D CNN）。空间金字塔池化模块是其核心组件，它通过在不同尺度和位置聚合上下文信息，形成一个成本体积（Cost Volume），这样能够捕捉到更广泛的视场范围内的特征，增强对难以确定对应关系区域的理解。 3D CNN部分则是对成本体积进行进一步处理的关键。它利用堆叠的多个hourglass网络结构，结合中间监督（Intermediate Supervision）来实现对成本体积的精细化处理和正则化。这种方法有助于减少误差并提高匹配的准确性。通过这种方式，PSMNet能够更有效地解决深度估计中的不确定性问题。 PSMNet在多个基准数据集上进行了评估，特别是在KITTI 2012和2015年的挑战赛中，截至2018年3月18日，该方法在性能上表现出色，排名首位。这证明了其在实际应用中显著的优势。值得注意的是，作者Jia-Ren Chang和Yong-Sheng Chen来自中国交通大学计算机科学系，他们的代码开源，对于研究者和开发者来说，这提供了宝贵的参考和实现资源。 PSMNet的提出革新了深度估计领域，展示了如何通过结合全局上下文信息和多尺度特征融合，有效提升立体匹配的精度，这对自动驾驶、机器人导航以及三维重建等领域的研究具有重要价值。

Pyramid Stereo Matching Network

Jia-Ren Chang Yong-Sheng Chen

Department of Computer Science, National Chiao Tung University, Taiwan

{followwar.cs00g, yschen}@nctu.edu.tw

Abstract

Recent work has shown that depth estimation from a

stereo pair of images can be formulated as a supervised

learning task to be resolved with convolutional neural net-

works (CNNs). However, current architectures rely on

patch-based Siamese networks, lacking the means to ex-

ploit context information for ﬁnding correspondence in ill-

posed regions. To tackle this problem, we propose PSM-

Net, a pyramid stereo matching network consisting of two

main modules: spatial pyramid pooling and 3D CNN. The

spatial pyramid pooling module takes advantage of the ca-

pacity of global context information by aggregating con-

text in different scales and locations to form a cost volume.

The 3D CNN learns to regularize cost volume using stacked

multiple hourglass networks in conjunction with interme-

diate supervision. The proposed approach was evaluated

on several benchmark datasets. Our method ranked ﬁrst in

the KITTI 2012 and 2015 leaderboards before March 18,

2018. The codes of PSMNet are available at:

https:

//github.com/JiaRenChang/PSMNet

1. Introduction

Depth estimation from stereo images is essential to com-

puter vision applications, including autonomous driving for

vehicles, 3D model reconstruction, and object detection and

recognition [4, 31]. Given a pair of rectiﬁed stereo images,

the goal of depth estimation is to compute the disparity d

for each pixel in the reference image. Disparity refers to the

horizontal displacement between a pair of corresponding

pixels on the left and right images. For the pixel (x, y) in the

left image, if its corresponding point is found at (x − d, y)

in the right image, then the depth of this pixel is calculated

, where f is the camera's focal length and B is the

distance between two camera centers.

The typical pipeline for stereo matching involves the

ﬁnding of corresponding points based on matching cost

and post-processing. Recently, convolutional neural net-

works (CNNs) have been applied to learn how to match

corresponding points in MC-CNN [

30]. Early approaches

using CNNs treated the problem of correspondence esti-

mation as similarity computation [

27, 30], where CNNs

compute the similarity score for a pair of image patches

to further determine whether they are matched. Although

CNN yields signiﬁcant gains compared to conventional ap-

proaches in terms of both accuracy and speed, it is still

difﬁcult to ﬁnd accurate corresponding points in inherently

ill-posed regions such as occlusion areas, repeated patterns,

textureless regions, and reﬂective surfaces. Solely applying

the intensity-consistency constraint between different view-

points is generally insufﬁcient for accurate correspondence

estimation in such ill-posed regions, and is useless in tex-

tureless regions. Therefore, regional support from global

context information must be incorporated into stereo match-

ing.

One major problem with current CNN-based stereo

matching methods is how to effectively exploit context in-

formation. Some studies attempt to incorporate seman-

tic information to largely reﬁne cost volumes or disparity

maps [

8, 13, 27]. The Displets [8] method utilizes object

information by modeling 3D vehicles to resolve ambigui-

ties in stereo matching. ResMatchNet [27] learns to mea-

sure reﬂective conﬁdence for the disparity maps to improve

performance in ill-posed regions. GC-Net [

13] employs the

encoder-decoder architecture to merge multiscale features

for cost volume regularization.

In this work, we propose a novel pyramid stereo match-

ing network (PSMNet) to exploit global context information

in stereo matching. Spatial pyramid pooling (SPP) [9, 32]

and dilated convolution [

2, 29] are used to enlarge the re-

ceptive ﬁelds. In this way, PSMNet extends pixel-level fea-

tures to region-level features with different scales of recep-

tive ﬁelds; the resultant combined global and local feature

clues are used to form the cost volume for reliable dispar-

ity estimation. Moreover, we design a stacked hourglass

3D CNN in conjunction with intermediate supervision to

regularize the cost volume. The stacked hourglass 3D CNN

repeatedly processes the cost volume in a top-down/bottom-

up manner to further improve the utilization of global con-

text information.

Our main contributions are listed below:

5410

下载后可阅读完整内容，剩余8页未读，立即下载

ai吖吖呀

粉丝: 3
资源: 7

Pyramid Stereo Matching Network: A 3D CNN Approach for Accurate ...

最新资源

Pyramid Stereo Matching Network: A 3D CNN Approach for Accurate ...

Stereo Matching: an Overview

Pyramid Stereo Matching Network代码理解1-Cosnet部分重点理解

pyramid stereo matching network

PSMNet <Pyramid Stereo Matching Network>

twPSMNet_pyramid_立体匹配_神经网络_stereomatching_PSMNET_

免费的防止锁屏小软件，可用于域统一管控下的锁屏机制

Python代码实现带装饰的圣诞树控制台输出

白色大气风格的设计师作品模板下载.zip

电商平台开发需求文档.doc

白色简洁风格的办公室室内设计门户网站模板下载.zip

最新资源