MULTIFRAME RAW-DATA DENOISING BASED ON BLOCK-MATCHING AND
3-D FILTERING FOR LOW-LIGHT IMAGING AND STABILIZATION
Giacomo Bor acchi
∗
and Alessandro Foi
∗∗
∗
Dipartimento di Elettronica e Informazione, Politecnico di Milano
via Ponzio 34/5, 20133, Milano, Italy
web: http://home.dei.polimi.it/boracchi email: Þrstname.lastname@polimi.it
∗∗
Department of Signal Processing, Tampere University of Technology
P.O. Box 553, 33101, Tampere, Finland
web: http://www.cs.tut.Þ/~foi email: Þrstname.lastname@tut.Þ
ABSTRACT
We consider the problem of the joint denoising of a number of raw-
data images from a digital imaging sensor. In particular, we exploit
a recently proposed image modeling [8] that incorporates both the
signal-dependent nature of noise and the clipping of the data due to
under- or over-exposure of the sensor.
Our denoising approach is based on the V-BM3D algorithm [5],
coupled with a set of homomorphic pre- and post-processing trans-
formations derived for variance-stabilization, debiasing, and declip-
ping [6]. The spatio-temporal nonlocality of V-BM3D frees us from
the need of an expli cit registration of the frames. It results in a prac-
tical algorithm directly applicable to raw-data processing , in partic-
ular for heavy-noise conditions such those encountered in low-light
imaging or imaging at fast shutter speeds.
Experiments with synthetic images and with real raw-data from
CCD sensor show the feasibility of the approach and provide an
indicative measure of the advantage of multiframe versus single-
frame processing.
1. INTRODUCTION
Pictures acquired by digital imaging sensors are al ways subject to
noise. While the signal-to-noise ratio (SNR) can be improved by us-
ing a longer exposure time, this is often not feasible because scene
motion (e.g ., due to movin g objects) or camera motion – also re-
ferred to as camera shake – during the acquisition wo uld result in
blur. The problem is particularly evident when acquiring images at
low-light conditions.
A num ber of diverse solutions have been devised to cope with
this kind of problems. These range from hardware solutions, such
as optical stabilization based on real-time motion-adaptiv e sensor or
lens actuation, to different acquisition paradigms. Particularly ef-
fective for compensating the impact of motion or hand-held camera
shake blur is the approach based on pairs of differently exposed im-
ages [14, 13, 17, 18, 16, 15]. The key idea is to capture two images:
one image taken with a short exposure-time, which ensures that the
blur is ne gligible at the expense of heavy noise, and another image
taken with a longer exposure, which reduces the noisiness but re-
sults in visible blur. Provided som e registration, the noisy image is
used in order to estimate the blur point-spread function (PSF), thus
enabling a non-blind or semi-blind deconv olution of the blurred im-
age. However, scene motion or camera shake v e ry seldom can be
faithfully described as a linear, shift-inv ariant blur; thus, heavy reg-
ularization is necessary to reduce artifacts [17].
This work was supported by the Academy of Finland (application no.
213462, Finnish Programme for Centres of Excellence in Research 2006-
2011, and application no. 118312, Finland Distinguished Professor Pro-
gramme 2007-2010) and by CIMO, the Finnish Centre for International
Mobility (fellowship number TM-07-4952). Part of this work was carried
out during the Þrst author’s visit at Tampere International Center for Signal
Processing (TICSP) in July-October 2007.
An alternative strategy is based on the joint denoising of mul-
tiple images captured sequ entially, thus making the problem con-
ceptually equivalent to a video-denoising problem. In this paper,
we follow this direction and consider the problem in the very spe-
ciÞc setting of raw-data processing, through an observation model
[8] that explicitly incorporates both the signal-dependent nature of
noise and the clipping of the data due to under- or over-exposur e of
the sensor.
In our approach, we rely on the Video Block-Matching 3-D de-
noising algorithm (V-BM3D) [5], coupled with a set of homomor-
phic pre- and post-processing transformations derived for variance-
stabilization, debiasing, and declipping [6]. The spatiotemporal
nonlocality of V-BM3D frees us from the need of an explicit reg-
istration of the frames, while the hom omorphic transformations en-
able an accurate estimation of the true image. Overall, it results
in a practical algorithm directly applicable to multiframe raw-data
processing, which simultaneously extends [5] and [6].
The rest of the paper is organized as follows: Section 2 intro-
duces the observation model and the principal ideas of the V-BM3D
Þlter. The proposed denoising algorithm is detailed in Section 3.
Experiments with synthetic images and with real raw-data from
CCD sensor are presented in Section 4, where we also compare
against the approach based on blurred-noisy image pairs. We con-
clude the paper with few remarks about the impact of redundancy
on the denoising performance.
2. PRELIMINARIES
2.1 Observation model
Let {˜z
i
}
N
i=1
, be a sequence set of N raw-data images. According to
[8, 6], each image ˜z
i
: X →
[
0!1
]
can be modeled as
˜z
i
"
x
#
=max
{
0!min
{
z
i
"
x
#
!1
}}
! x ∈ X ⊂ Z
2
! (1)
where
z
i
"
x
#
= y
i
"
x
#
+$
"
y
i
"
x
##
%
i
"
x
#
,(2)
y
i
: X → Y ⊆ R is a deterministic unkno wn original image and
$
"
y
i
"
x
##
%
i
"
x
#
is a zero-mean random error with signal-dependent
standard-de viation $
"
y
i
"
x
##
. Here, $ : R → R
+
is a deterministic
function while %
i
"
x
#
is a random variable with unitary variance.
For simplicity, the latter shall be approximated as a standard normal
and all errors are assumed to be independent, thus treating %
i
as
i.i.d. with %
i
"
·
#
∼ N
"
0!1
#
. As discussed in [8], this is a suitable
approximation when dealing with the noise in the raw data from
CMOS and CCD digital imaging sensors. For these raw data, the
typical form of the function $ is
$
2
"
y
i
"
x
##
=ay
i
"
x
#
+b! (3)
with the constants a ∈R
+
and b ∈R depending on the sensor’s spe-
ciÞc characteristics and on the particular acquisition settings (e.g.,
analog gain or ISO value, temperature, pedestal, etc.) [8]. These
two constants are assumed Þxed and invariant during the acquisi-
tion of the images and thus the same for all i =1!&&&! N.