深度学习驱动的立体匹配与深度图获取技术

37 浏览量更新于2024-08-29 2 收藏 756KB PDF 举报

"这篇研究论文探讨了基于深度学习的立体匹配和深度图采集算法，通过在Linux平台上利用Torch深度学习框架构建神经网络，改进传统算法来计算匹配成本函数，以此提高立体匹配的准确率。文章还介绍了如何通过不同的激活函数、批量归一化层等方法优化卷积神经网络结构，以降低错误匹配率，并利用后处理算法，包括匹配成本聚合、视差计算和视差细化，得到视差图和深度图。实验验证了算法的效果，并在Middlebury立体算法评估平台上进行了评估。" 本文深入研究了利用深度学习技术改进的立体匹配和深度图采集方法。传统的立体匹配算法通常依赖于手工设计的特征和匹配准则，而深度学习则能够自动学习图像中的复杂模式，从而提高匹配精度。作者采用Torch深度学习框架，这是一套广泛用于深度学习研究的开源工具，它提供了灵活的环境来构建和训练神经网络模型。研究的核心是用卷积神经网络（CNN）替代传统的匹配成本计算方法。CNN在图像处理任务中展现出强大的表征学习能力，能有效地捕捉图像的局部和全局特征。为了进一步提升性能，论文提出了对CNN结构的改进，包括采用不同的激活函数（如ReLU或Leaky ReLU）以增加非线性表达能力，以及添加批量归一化层来加速训练过程并减少内部协变量漂移。完成匹配成本计算后，后处理步骤至关重要。这些步骤包括匹配成本聚合，通过对不同路径的成本进行整合来确定最佳匹配；视差计算，通过找到最小成本路径来估计每个像素的深度；以及视差细化，通过平滑和修正潜在错误的视差估计来提高深度图的质量。实验部分展示了算法的实际效果，通过与标准数据集上的结果对比，证明了所提算法的有效性。论文还利用Middlebury立体算法评估平台，这是一个广泛认可的立体视觉评估标准，对算法进行了量化评估，进一步验证了其在立体匹配和深度图生成方面的优越性能。这篇论文为深度学习在立体匹配领域的应用提供了新的见解，对于推动计算机视觉、自动驾驶和机器人导航等领域的发展具有重要意义。通过深度学习优化的立体匹配算法不仅提高了匹配精度，还为实时和高精度的3D场景重建提供了可能。

Stereo Matching and Depth Map Collection

Algorithm Based on Deep Learning

Hao Xu

School of Instrumentation Science and Optoelectronics Engineering

Beihang University

Beijing, China

15101501309@163.com

Abstract—This paper elaborates the research on stereo

matching and depth map collection algorithm based on deep

learning, use the Torch deep learning framework based on Linux

platform to build neural network. The neural network refers to

open source algorithm to achieve the use of convolution neural

network instead of the traditional algorithm to calculate the

matching cost function. This paper also improves the structure of

the convolution neural network by using different activation

function and adding the batch normalization layer and other

methods, reducing the error matching rate, and then get the

disparity map and depth map by the post-processing algorithm

which includes the matching cost aggregation, disparity

computation, disparity refinement. Then the paper verified the

effect of the algorithm by experiment and analyzed the

experiment results, and used the Middlebury stereo algorithm

evaluation platform to evaluate the algorithm. And finally the

porposed algorithm gets better stereo matching effect than

before.

Keywords—Deep learning; Stereo matching; Convolution

neural network; Disparity map; Depth map

1. INTRODUCTION

Nowadays stereo matching technology has been widely

used in robot navigation, DoF rendering [1] and video process

[2,3,4]. Various algorithms were presented in recent year. And

many deep learning [5,6,7] methods applied to the field of

image research in recent years, especially the use of neural

network [8,9,10,11].

In this paper, matching cost is computed by adopting the

deep learning algorithm. Convolutional neural network [12]

has been developed rapidly and the architecture constructed

by J. Bontar and Yann LeCun [13] is the most popular at

present. They describe two network architectures for learning

a similarity measure on image patches. One work for the

accuracy of the matching, the other for speed. Both of them

achieved remarkable results.

In stereo vision, after capturing the left and right images

using the stereo camera and computing the disparity d, we can

compute the depth z using the following equation.

(1)

where f denotes the focal distance, B denotes the distance

between the center of two cameras. This equation transforms

the problem of depth map calculation into the problem of

disparity map calculation.

Furthermore, this paper assumes that the two cameras only

differ in their horizontal locations in the binocular stereo

system. And this assumption can be implemented through

image polar line correction. We only consider the dense stereo

method which makes depth estimation for each pixel in the

image.

According to the work of Scharstein, the classic stereo

matching algorithm can be divided into four modules [14]:

matching cost computation, matching cost aggregation,

disparity computation, and disparity refinement. The last three

modules are called the post-processing algorithm. The next

section will concentrate on the computation of matching cost

using the convolutional neural network.

2. MATCHING COST COMPUTATION VIA

CONVOLUTIONAL NEURAL NETWORK

2.1 Neural network structure

Motivated by mc-cnn-fst and mc-cnn-acrt algorithms in

[13], we adopt two structures (i.e. fast structure and accurate

structure) to learn similarity between image patches. In the

two structures, network input consists of a pair of tiny patches

and network output consists of the similarity score of the two

image patches.

Fig. 1 Fast network structure

The first structure refers to the fast structure, as shown in

Fig. 1.

This full text paper was peer-reviewed at the direction of IEEE Instrumentation and Measurement Society prior to the acceptance and publication.

336

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38581405

粉丝: 2
资源: 947

深度学习驱动的立体匹配与深度图获取技术

立体匹配算法源码解析与立体视觉技术应用

实现高效立体匹配的AD-Census算法研究

基于双目立体视觉的图像匹配与测距技术研究

基于深度学习的双目立体匹配算法代码详细注释

基于深度学习的智慧图书馆移动视觉搜索服务模式研究.pdf

基于深度学习的智能辅助驾驶系统.pdf

基于深度学习的双目视觉汽车压盘装配生产线研究.pdf

基于Python 实现双目立体视觉平台上的图像匹配以及目标物体的距离测量技术（含摘要+翻译+源码）

基于Python+OpenCV实现双目立体视觉的图像匹配与测距源码+文档说明（毕业设计和期末大作业都适用）

球形立体视觉中的图像匹配技术研究

最新资源