RGB Camera Gated Camera Lidar Bird’s Eye View
Figure 2: Sensor performance in a fog chamber with very
dense fog. The first row shows recordings without fog while
the second row shows the same scene in dense fog.
outdoor environments is an open challenge. Recent ap-
proaches tackle the lack of dense training data by propos-
ing semi-supervised methods relying on relative depth [10],
stereo images [15, 16, 31], sparse lidar points [31] or seman-
tic labels [62]. Passive methods have in common that their
precision is more than an order of magnitude below that of
scanning lidar systems which makes them no valid alterna-
tive to ubitious lidar ranging in autonomous vehicles [51].
In this work, we propose a method that allows to close this
precision gap using low-cost gated imagers.
Sparse Depth Completion. As an alternative approach
to recover accurate dense depth, a recent work proposes
depth completion from sparse lidar measurements. Simi-
lar to monocular depth estimation, learned encoder-decoder
architectures have been proposed for this task [11, 27, 37].
Jaritz et al. [27] propose to incorporate color RGB data for
upsampling sparse depth samples but also require sparse
depth samples in down-stream scene understanding tasks.
To allow for an independent design of depth estimation and
scene analysis algorithms, the completion architecture has
to be trained with varying sparsity patterns [27, 37] or ad-
ditional validity maps [11]. While these depth completion
methods offer improved depth estimates, they suffer from
the same limitation as scanned lidar: low spatial resolu-
tion at long ranges due to limited angular sampling, low-
resolution detectors, and costly mechanical scanning.
Time-of-Flight Depth Cameras. Amplitude-modulated C-
ToF cameras [19, 30, 33], such as Microsoft’s Kinect One,
have become broadly adopted for indoor sensing [23, 53].
These cameras measure depth by recording the phase shift
of periodically-modulated flood light illumination, which
allows to extract the time-of-flight for the reflected flood
light from the source to scene and back to the camera. How-
ever, in addition to the modulated light, this sensing ap-
proach also records all ambient background light. While
per-pixel lock-in amplification removes background com-
ponents efficiently in indoor scenarios [33], and learned ar-
chitectures can alleviate multi-path distortions [55], exist-
ing C-ToF cameras are limited to ranges of a few meters in
outdoor scenarios [22] in strong sunlight.
Gated cameras send out pulses of flood-light and only
record photons from a certain distance by opening and clos-
ing the camera after a given delay. Gated imaging has first
been proposed by Heckman et al. [21]. This acquisition
mode allows to gate out backscatter from fog, rain, and
snow [18]. Busck et al. [3, 6, 7] use gated imaging for
high-resolution depth sensing by capturing large sequences
of narrow gated slices. However, as the depth accuracy is
inversely related to the gate width, and hence the number
of required captures, sequentially capturing high-resolution
gated depth is infeasible at real-time frame-rates. Recently,
a line of research proposes analytic reconstruction mod-
els for known pulse and integration shapes [34, 35, 61].
These approaches require perfect knowledge of the integra-
tion and pulse profiles, which is impractical due to drift,
and they provide low precision for broad gating windows
in real-time capture settings. Adam et al. [2], and Schober
et al. [50], present Bayesian methods for pulsed time-of-
flight imaging of room-sized scenes. These methods solve
probabilistic per-pixel estimation problems using priors on
depth, reflectivity and ambient light, which is possible when
using nanosecond exposure profiles [2,
50] for room-sized
scenes. In this work, we demonstrate that exploiting spatio-
temporal scene semantics allows to recover dense and lidar-
accurate depth from only three slices, with exposures two
orders of magnitude longer (> 100 ns), acquired in real-
time. Using such wide exposure gates allows us to rely
on low-cost gated CMOS imagers instead of detectors with
high temporal resolution, such as SPADs.
3. Gated Imaging
In this section, we review gated imaging and propose an
analytic per-pixel depth estimation method.
Gated Imaging Consider the setup shown in Figure 3,
where an amplitude-modulated source flood-illuminates the
scene with broad rect-shaped “pulses” of light. The syn-
chronized camera opens after a delay ξ to receive only pho-
tons with round-trip path-length longer than ξ · c, where c
is the speed of light. Assuming a dominating lambertian
reflector at distance r, the detector gain is temporally mod-
ulated with the gating function g resulting in the exposure
measurement
I (r)=αC(r)=
∞
−∞
g (t − ξ) κ (t, r) dt, (1)
where κ is the temporal scene response, α the albedo of
the reflector, and C (r) the range-intensity profile. With the
reflector at distance r, the temporal scene response can be
described as
κ (t, r)=αp
t −
2r
c
β (r) . (2)
where p is here the laser pulse profile and atmospheric ef-
fects, e.g. in a scattering medium, are modeled by the
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on October 18,2021 at 05:19:23 UTC from IEEE Xplore. Restrictions apply.