Raskar and Tumblin/Computational Photography
Coded Photography is a notion of an 'out-of-the-box'
photographic method, in which individual (ray) samples
or data sets are not comprehensible as ‘images’ without
further decoding, re-binning or reconstruction. For
example, a wrap around view is built from images taken
with multiple centers of projection but by taking only a
few pixels from each input image. Some other examples
include confocal images and coded aperture images.
We may be converging on a new, much more capable
'box' of parameters in computational photography that
we don't yet recognize; there is still quite a bit of
innovation to come!
In the rest of the STAR, we survey recent techniques
that exploit exposure, focus and active illumination.
3 High Dynamic Range
3.1 Multiple Exposures
One approach of capturing high dynamic range scenes is
to capture multiple images using different exposures,
and then merge these images. The basic idea is that
when high exposures are used, dark regions are well
imaged but bright regions are saturated. On the other
hand, when low exposures are used, dark regions are too
dark but bright regions are well imaged. If exposure
varies and multiple pictures are taken of the same scene,
value of a pixel can be taken from those images where
it's neither too dark nor saturated. This type of approach
is often referred to as exposure bracketing, and has been
widely adopted [Morimura 1993, Burt and Kolczynski
1993,Madden 1993,Tsai 1994]. Imaging devices usually
contain nonlinearities, where pixel values are
nonlinearly related to the brightness values in the scene.
Some authors have proposed to use images acquired
under different exposures to estimate the radiometric
response function of an imaging device, and use the
estimated response function to process the images before
merging them [Mann and Picard 1995, Debevec and
Malik 1997, Mitsunaga and Nayar 1999.]
3.2 Sensor Design
At the sensor level, various approaches have also been
proposed for high dynamic range imaging. One type of
approach is to use multiple sensing elements with
different sensitivities within each cell [Street 1998,
Handy 1986, Wen 1989, Hamazaki 1996]. Multiple
measurements are made from the sensing elements, and
they are combined on-chip before a high dynamic range
image is read out from the chip. Spatial sampling rate is
lowered in these sensing devices, and spatial resolution
is sacrificed. Another type of approach is to adjust the
well capacity of the sensing elements during
photocurrent integration [Knight 1983, Sayag 1990,
Decker 1998] but this gives higher noise. A different
approach is proposed by [Brajovic and Kanade 1996],
where the time it takes to reach saturation is measured,
by a computation element attached to each sensing
element. This time encodes high dynamic range
information, as it is inversely proportional to the
brightness at each pixel. Logarithmic sensors [Scheffer
et al 2000] have also been proposed to increase the
dynamic range. Brightside exploits the interline transfer
of a charge coupled device (CCD) based camera to
capture two exposures during a single mechanical
shutter timing.
High dynamic range sensor design is in progress, but the
implementation is usually costly. A rather novel and
flexible approach is proposed by [Nayar and Mitsunaga
2000, Narasimhan and Nayar 2005], where exposures
vary across space of the imager. A pattern with varying
sensitivities is applied to the pixel array. It resembles the
Bayer pattern in color imaging, but the sampling is made
along the exposure instead of wavelength. The particular
form of the sensitivity pattern, and the way of
implementing it, are both quite flexible. One way of
implementing it is to place a mask with cells of varying
optical transparencies in front of the sensing array. Here,
just as in Bayer mosaic, spatial resolution is sacrificed to
some extent and aliasing can occur. Measurements
under different exposures (sensitivities) are spatially
interpolated, and combined into a high dynamic range
image.
4 Aperture and Focus
Several concepts in exploiting focus and aperture
parameters can be understood by considering the 4D
lightfields transfer via lens and its 2D, 3D or 4D
projection recorded on the image sensor.
Defocus Video Matting
Video matting is the process of recovering a high-
quality alpha matte and foreground from a video
sequence. Common approaches require either a known
background (e.g., a blue screen) or extensive user
interaction (e.g., to specify known foreground and
background elements). The matting problem is generally
under-constrained, unless additional information is
recorded at the time of capture. McGuire et. al. have
proposed a novel, fully autonomous method for pulling
a matte using multiple synchronized video cameras that
share the center of projection but differ in their plane of
focus [McGuire et. al 2005]. The multi-camera data
stream over-constrains the problem and the solution is
obtained by directly minimizing the error in filter-based
image formation equations. Their system solves the fully
dynamic video matting problem without user assistance:
both the foreground and background may be high
frequency and have dynamic content, the foreground
may resemble the background, and the scene may be lit
by natural (as opposed to polarized or collimated)
illumination. The authors capture 3 synchronized video
© The Eurographics Association 2006