Stereo Matching and Depth Map Collection
Algorithm Based on Deep Learning
Hao Xu
School of Instrumentation Science and Optoelectronics Engineering
Beihang University
Beijing, China
15101501309@163.com
Abstract—This paper elaborates the research on stereo
matching and depth map collection algorithm based on deep
learning, use the Torch deep learning framework based on Linux
platform to build neural network. The neural network refers to
open source algorithm to achieve the use of convolution neural
network instead of the traditional algorithm to calculate the
matching cost function. This paper also improves the structure of
the convolution neural network by using different activation
function and adding the batch normalization layer and other
methods, reducing the error matching rate, and then get the
disparity map and depth map by the post-processing algorithm
which includes the matching cost aggregation, disparity
computation, disparity refinement. Then the paper verified the
effect of the algorithm by experiment and analyzed the
experiment results, and used the Middlebury stereo algorithm
evaluation platform to evaluate the algorithm. And finally the
porposed algorithm gets better stereo matching effect than
before.
Keywords—Deep learning; Stereo matching; Convolution
neural network; Disparity map; Depth map
1. INTRODUCTION
Nowadays stereo matching technology has been widely
used in robot navigation, DoF rendering [1] and video process
[2,3,4]. Various algorithms were presented in recent year. And
many deep learning [5,6,7] methods applied to the field of
image research in recent years, especially the use of neural
network [8,9,10,11].
In this paper, matching cost is computed by adopting the
deep learning algorithm. Convolutional neural network [12]
has been developed rapidly and the architecture constructed
by J. Bontar and Yann LeCun [13] is the most popular at
present. They describe two network architectures for learning
a similarity measure on image patches. One work for the
accuracy of the matching, the other for speed. Both of them
achieved remarkable results.
In stereo vision, after capturing the left and right images
using the stereo camera and computing the disparity d, we can
compute the depth z using the following equation.
fB
z
d
=
(1)
where f denotes the focal distance, B denotes the distance
between the center of two cameras. This equation transforms
the problem of depth map calculation into the problem of
disparity map calculation.
Furthermore, this paper assumes that the two cameras only
differ in their horizontal locations in the binocular stereo
system. And this assumption can be implemented through
image polar line correction. We only consider the dense stereo
method which makes depth estimation for each pixel in the
image.
According to the work of Scharstein, the classic stereo
matching algorithm can be divided into four modules [14]:
matching cost computation, matching cost aggregation,
disparity computation, and disparity refinement. The last three
modules are called the post-processing algorithm. The next
section will concentrate on the computation of matching cost
using the convolutional neural network.
2. MATCHING COST COMPUTATION VIA
CONVOLUTIONAL NEURAL NETWORK
2.1 Neural network structure
Motivated by mc-cnn-fst and mc-cnn-acrt algorithms in
[13], we adopt two structures (i.e. fast structure and accurate
structure) to learn similarity between image patches. In the
two structures, network input consists of a pair of tiny patches
and network output consists of the similarity score of the two
image patches.
Fig. 1 Fast network structure
The first structure refers to the fast structure, as shown in
Fig. 1.
978-1-5386-1620-8/17/$31.00 ©2017 IEEE
This full text paper was peer-reviewed at the direction of IEEE Instrumentation and Measurement Society prior to the acceptance and publication.
336