Shape from Polarization for Complex Scenes in the Wild
Chenyang Lei
*
1
Chenyang Qi
*
1
Jiaxin Xie
*
1
Na Fan
1
Vladlen Koltun
2
Qifeng Chen
1
1
HKUST
2
Apple
Abstract
We present a new data-driven approach with physics-
based priors to scene-level normal estimation from a single
polarization image. Existing shape from polarization (SfP)
works mainly focus on estimating the normal of a single ob-
ject rather than complex scenes in the wild. A key barrier
to high-quality scene-level SfP is the lack of real-world SfP
data in complex scenes. Hence, we contribute the first real-
world scene-level SfP dataset with paired input polarization
images and ground-truth normal maps. Then we propose a
learning-based framework with a multi-head self-attention
module and viewing encoding, which is designed to handle
increasing polarization ambiguities caused by complex ma-
terials and non-orthographic projection in scene-level SfP.
Our trained model can be generalized to far-field outdoor
scenes as the relationship between polarized light and sur-
face normals is not affected by distance. Experimental re-
sults demonstrate that our approach significantly outper-
forms existing SfP models on two datasets. Our dataset
and source code will be publicly available at https:
//github.com/ChenyangLEI/sfp-wild.
1. Introduction
Accurate surface normal estimation in the wild can pro-
vide valuable information about a scene’s geometry and
can be used in various computer vision tasks, including
segmentation [19], 3D reconstruction [26], and many oth-
ers [22, 33]. Therefore, normal estimation is an important
task studied for a long time. However, estimating high-
quality normals in the wild is still an open problem. Var-
ious techniques such as photometric stereo [9, 10] can pro-
duce high-frequency normals, but most of them only pro-
vide short-range object-level normal maps. Active depth
sensors can be another approach to obtaining normals from
depth maps, but the corresponding depth maps are often
sparse (LiDAR) or noisy (time-of-flight, structured light)
so they can not estimate normals reliably. Also, the depth
range of active sensors is limited.
*
Joint first authors
Input I
un
Input ϕ Without pol. With pol.
Figure 1. Our method can estimate dense scene-level surface nor-
mals from a single polarization image. Polarization can provide
effective cues for obtaining more accurate results. In the first row,
polarization provides geometry cues for our model so that it is not
fooled by objects in the printed image on a wall. In the second
and third rows, polarization provides guidance for planes with dif-
ferent surface normals even when their materials are quite similar.
I
un
: unpolarized image; ϕ: angle of polarization.
In this work, we are interested in estimating surface nor-
mal from a single polarization image for complex scenes in
the wild. Since the polarization of light changes differently
when the light interacts with the surfaces of different shapes
and materials (governed by the Fresnel equations [12]), the
polarization images can provide dense surface orientation
cues from the polarized light perceived at each pixel. Also,
compared with the active sensors and object-level normal
estimation techniques (e.g., photometric stereo), the polar-
ization camera is a passive sensor that is not constrained to a
specific depth range. Thus polarization images are promis-
ing data sources for accurate normal estimation in the wild.
However, estimating normals from a polarization image
for complex scenes (scene-level SfP) is challenging. To
the best of our knowledge, no existing SfP work focuses
on complex scenes, and several challenges are yet to be
solved. Firstly, polarization contains ambiguities from un-
known information such as object materials and reflection
12632