ACCEPTED. TO APPEAR IN IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 8, AUGUST 2007. 4
the similarity between grouped fragments.
The 3D transform can take advantage of both kinds of cor-
relation and thus produce a sparse representation of the true
signal in the group. This sparsity makes the shrinkage very
effective in attenuating the noise while preserving the features
of the signal.
Let us give a simple illustration of the benet of this
collaborative shrinkage by considering the grouped image
blocks shown in Figure 1. Let us rst consider the case
when no collaborative ltering is performed but instead a 2D
transform is applied separately to each individual block in a
given group of n fragments. Since these grouped blocks are
very similar, for any of them we should get approximately
the same number, say , of signicant transform coefcients.
It means that the whole group of n fragments is represented
by n coefcients. In contrast, in the case of collaborative
ltering, in addition to the 2D transform, we apply a 1D
transform across the grouped blocks (equivalent to applying
a separable 3D transform to the whole group). If this 1D
transform has a DC-basis element, then because of the high
similarity between the blocks, there are approximately
1
only
signicant coefcients that represent the whole group instead
of n. Hence, the grouping enhances the sparsity, which
increases with the number of grouped blocks.
As Figure 1 demonstrates, a strong similarity between small
image blocks at different spatial locations is indeed very
common in natural images. It is a characteristic of blocks
that belong to uniform areas, edges, textures, smooth intensity
gradients, etc. Therefore, the existence of mutually similar
blocks can be taken as a very realistic assumption when
modeling natural images, which strongly motivates the use
of grouping and collaborative ltering for an image denoising
algorithm.
III. ALGORITHM
In the proposed algorithm, the grouping is realized by block-
matching and the collaborative ltering is accomplished by
shrinkage in a 3D transform domain. The used image frag-
ments are square blocks of xed size. The general procedure
carried out in the algorithm is as follows. The input noisy
image is processed by successively extracting reference blocks
from it and for each such block:
nd blocks that are similar to the reference one (block-
matching) and stack them together to form a 3D array
(group);
perform collaborative ltering of the group and return
the obtained 2D estimates of all grouped blocks to their
original locations.
After processing all reference blocks, the obtained block
estimates can overlap and thus there are multiple estimates for
each pixel. We aggregate these estimates to form an estimate
of the whole image.
1
This is just a qualitative statement because the actual number of signicant
coefcients depends on the normalization of the transforms and on the
thresholds used for the 2D and 3D cases.
This general procedure is implemented in two different
forms to compose a two-step algorithm. This algorithm is
illustrated in Figure 3 and proceeds as follows:
Step 1. Basic estimate.
a) Block-wise estimates. For each block in the noisy
image, do the following.
i) Grouping. Find blocks that are similar to the
currently processed one and then stack them
together in a 3D array (group).
ii) Collaborative hard-thresholding. Apply a 3D
transform to the formed group, attenuate the
noise by hard-thresholding of the transform
coefcients, invert the 3D transform to produce
estimates of all grouped blocks, and return
the estimates of the blocks to their original
positions.
b) Aggregation. Compute the basic estimate of the
true-image by weighted averaging all of the ob-
tained block-wise estimates that are overlapping.
Step 2. Final estimate: using the basic estimate, perform im-
proved grouping and collaborative Wiener ltering.
a) Block-wise estimates. For each block, do the fol-
lowing.
i) Grouping. Use BM within the basic estimate to
nd the locations of the blocks similar to the
currently processed one. Using these locations,
form two groups (3D arrays), one from the
noisy image and one from the basic estimate.
ii) Collaborative Wiener ltering. Apply a 3D
transform on both groups. Perform Wiener
ltering on the noisy one using the energy
spectrum of the basic estimate as the true
(pilot) energy spectrum. Produce estimates of
all grouped blocks by applying the inverse 3D
transform on the ltered coefcients and return
the estimates of the blocks to their original
positions.
b) Aggregation. Compute a nal estimate of the true-
image by aggregating all of the obtained local
estimates using a weighted average.
There are two signicant motivations for the second step in
the above algorithm:
using the basic estimate instead of the noisy image allows
to improve the grouping by block-matching;
using the basic estimate as the pilot signal for the empir-
ical Wiener ltering is much more effective and accurate
than the simple hard-thresholding of the 3D spectrum of
the noisy data.
Observation model and notation
We consider a noisy image z : X ! R of the form
z (x) = y (x) + (x) , x 2 X,
where x is a 2D spatial coordinate that belongs to the image
domain X Z
2
, y is the true image, and is i.i.d. zero-
mean Gaussian noise with variance
2
, () N
0;
2
.