Surface Height Map Estimation from a Single Image Using
Convolutional Neural Networks
Xiaowei Zhou†, Guoqiang Zhong†, Lin Qi†, Junyu Dong*†,Tuan D. Pham§ and Jianzhou Mao‡
†Ocean University of China, §Linkoping University, ‡Macau University of Science and Technology
Email: dongjunyu@ouc.edu.cn.
ABSTRACT
Surface height map estimation is an important task in high-resolution 3D reconstruction. This task differs from
general scene depth estimation in the fact that surface height maps contain more high frequency information or fine
details. Existing methods based on radar or other equipments can be used for large-scale scene depth recovery, but might
fail in small-scale surface height map estimation. Although some methods are available for surface height reconstruction
based on multiple images, e.g. photometric stereo, height map estimation directly from a single image is still a
challenging issue. In this paper, we present a novel method based on convolutional neural networks (CNNs) for
estimating the height map from a single image, without any equipments or extra prior knowledge of the image contents.
Experimental results based on procedural and real texture datasets show the proposed algorithm is effective and reliable.
Keywords: Surface height map estimation, surface reconstruction, CNNs.
1. INTRODUCTION
Surface height map estimation is very important for high-resolution 3D reconstruction. A surface height map is a
raster image that stores height values, such as geometric height of a bumpy surface [1]. We can display the height map as
luma of a grayscale image (Fig.1 (a)). The height map can be used to render the surface (Fig.1 (b)) by using LuxRender.
Traditionally, large-scale scene depth or geometry information can be captured with the aid of active equipments, such as
radar, laser scan and Kinect [2]. Liu et al. proposed a method based on the results from an Interferometric SAR (InSAR)
analysis to estimate the height map of high-rise buildings [3]. However, these methods might fail in estimating the height
map of a textured surface, which contains high frequency information or fine details. On the other hand, some methods
are available for estimating surface height map from multiple images, e.g. photometric stereo [4], or from a single image
with prior knowledge [5]. In order to estimate surface height map, photometric stereo was introduced in [4]. Han et al.
estimated detailed shape of objects from single RGB-D image through estimating light model and shape-from-shading
approach [5]. Nevertheless, estimating surface height map from a single image without other equipments or extra prior
knowledge of the image contents is still a challenging and open issue.
Figure 1. (a) Surface height map; (b) rendered image.
Figure 2. The original surface height maps and surface height
maps estimated by CNN (our), SSAE and SC.
In recent years, deep learning has achieved state-of-the-art results in many fields. Liu et al. proposed a method based
on deep CNNs [6] and CRF to estimate depth from single image [7]. David et al. predicted depth, surface normals and
semantic labels by using a common multi-scale convolutional architecture [8]. These methods focus on predicting
general large-scale scene depth, such as meeting room and dining hall. Our task differs from general scene depth
estimation in the fact that surface height maps contain more high frequency information or fine details. In contrast to
previous work, we proposed a method based on convolutional neural networks to predict a surface height map from a
single image without extra equipments or prior knowledge. We have performed extensive experiments on a procedural
texture dataset [9] and a real texture dataset PerTex [10], to verify the effectiveness of the proposed method.
Experimental results show that our method is effective to estimate the surface height map from a rendered image. The