Kalman filtering of patches for frame-recursive video denoising
Pablo Arias
CMLA, CNRS, ENS Paris-Saclay
pablo.arias@cmla.ens-cachan.fr
Jean-Michel Morel
CMLA, CNRS, ENS Paris-Saclay
morels@cmla.ens-cachan.fr
Abstract
A frame recursive video denoising method computes
each output frame as a function of only the current noisy
frame and the previous denoised output. Frame recursive
methods were among the earliest approaches for video de-
noising. However in the last fifteen years they have been
used almost exclusively for real-time applications with de-
noising performance far from being state-of-the-art. In this
work we propose a simple frame recursive method which
is fast, has a low memory complexity and achieves re-
sults competitive with more complex state-of-the-art meth-
ods that require processing several input frames for pro-
ducing each output frame. Furthermore, in terms of visual
quality, the proposed approach is able to recover many de-
tails that are missed by most non-recursive methods. As
an additional contribution we also propose an off-line post-
processing of the denoised video that boosts denoising qual-
ity and temporal consistency.
1. Introduction
Denoising is a fundamental problem in image and video
processing, and a necessary step in almost any imaging
pipeline as the RAW data captured by the sensor is unavoid-
ably corrupted by noise. After decades of research the field
has evolved significantly. So much so, that in the case of
still images it is difficult for a new method to obtain a sig-
nificant improvement over the state of the art (for white ad-
ditive Gaussian noise). Quite different is the situation in
video denoising: There is still a lot of room for improve-
ment and some approaches have been little explored.
Although evidently related, the problems of image and
video denoising have important differences. The temporal
consistency of videos facilitates the denoising as it provides
a strong source of redundancy absent in still images. At the
same time it also brings a new challenge: the output of the
denoising algorithm is required to have the same temporal
consistency, a key element for the perceived quality of a
video [36, 28]. In addition, video denoising algorithms need
to process a much larger amount of data, which results in
more exigent design constraints for practical methods.
Currently, the best results are obtained by patch-based
methods [13, 29, 3, 17, 9, 38] that benefit from the fact that
video patches have several similar peers. They group to-
gether similar patches in highly redundant sets which can
therefore be denoised effectively.
Convolutional neural networks (CNN) have been suc-
cessfully applied to image denoising (e.g. [42, 43, 35])
but their application to video denoising has been limited so
far. In [12] a recurrent architecture is proposed, but the re-
sults are below the state-of-the-art. Recently, [15] reported
state-of-the-art results with a hybrid method which applies a
CNN to an image of “non-local features”: the values of the
centers of the most similar patches at each location. Some
works have tackled the related problem of burst denoising
[21, 30], but do not report results on video.
All these methods have in common that they produce
an output frame u
t
at time t as a function of a number
of input noisy frames f
s
in a temporal vicinity: u
t
=
D(f
t−h
, . . . , f
t+h
).
1
In spite of their good results, they
have some disadvantages: (i) They tend to be computa-
tionally costly, since they have to process a volume to pro-
duce each frame; (ii) They have a latency of h frames (this
could be avoided by using only past frames, possibly with
a penalty in the output quality); and (iii) They lack of a di-
rect way to control the temporal consistency of the result.
Therefore they are only suited for off-line processing.
An alternative approach is given by recursive algorithms
where the output at t depends on the previous output: u
t
=
D(f
t
, u
t−1
). These were among the earliest methods for
video denoising [7], and today are used mainly as a choice
for algorithms that need to operate at real-time frame rates.
The works [45, 31] combine a spatial bilateral filter with a
temporal Kalman filter which is applied when no motion is
detected. Recursive versions of the non-local means algo-
rithm [8] were proposed in [23, 1]. Recently, [16] presented
a multi-resolution approach for real-time video denoising.
The focus of these works is reducing the number of opera-
1
The exception are the recurrent networks [12] and [21]. The former
does not achieve state-of-the-art results and the latter is for burst denoising
and cannot be directly applied to video.
1