sparse matching algorithm to find matching seed pixels. We
then use a propagation strategy to compute rough motion
vectors for the image. Last, we combine the Lucas–
Kanade method for the final motion vectors.
It is widely understood that unpredictable factors, such
as blurring and noise, can cause i ll-posed problems in
super-res olu ti on rec ons tr uc tion .
3,15
Because super-resolu-
tion can be translated into a search for an optimized solu-
tion,
3,5
regularization is widely used in the optimization
pr ocess. Many Tikhonov- based and total variation (T V)-
based regularization methods, s uc h as super-re s ol ut i on
techniques,
3,16,17
have been proposed to solve such ill-
posed problems. By using TV regularization—which i s
widely employed in denoising and deblurring
15,18
—the
ill-posed super-resolution problem becomes optimizable.
This method has the advantage of preserving edges while
not severely penalizing steep loc al gradients; it can, there-
fore, be reasonably employed in a wide range of applica-
tions. In addition, many regularization-based multivideo
super-res ol uti on rec on st ruc ti on met hod s have been pro-
posed.
3,19
The basic steps of multivideo super-resolution
involve the space–time alignment and reconstruction of
multiple images. However, managing the alignment param-
eters of two cameras is challenging.
In this paper, we apply a robust super-resolution algo-
rithm to solve l
1
-norm which includes data fusion and
regularization terms. Although data fusion relates to solving
motion, blur, and downsample degradation factors, regulari-
zation terms more typically involve preserving edges.
The remainder of this paper is organized as follows. In
Sec. 2, we introduce the LR video observation and super-
resolution model. In Sec. 3, we present our registration
algorithm, and in Sec. 4, we introduce our super-resolution
algorithm. In Sec. 5, we descri be the numerous experiments
we performed to verify our registration accuracy and the
effectiveness of the super-resolution. Our conclusions are
presented in Sec. 6.
2 Low-Resolution Video Observation and
High-Resolution Video Reconstruction Model
In the LR video observation and HR video reconstruction
model, the original HR dynamic frame is denoted as F.It
can be assumed that, after subpixel and subframe shifting,
space blurring, downsampling, and the introduction of noise
effects, the HR video will be degraded to an LR video
(Fig. 1). In model (1), D
k
represents the space decimatio n
matrix associated with the k’th LR video frame, specifically,
a tw o-scale down-sampling operation in space domain
D ¼
1
4
11
11
, T
k
, which is represented as a map ½v
ði;jÞ
and indicates the motion direction and position for every
pixel. This is the geometric motion operator between the
HR scene and k’th LR frame Y
k
; H
k
;is the camera point
spread function (PSF) model. This degradation can be
expressed by
3,20,21
Y
k
¼ D
k
H
k
T
k
F þ n
k
frame number∶k ¼ 1;:::;N: (1)
For a camera-obtained video sequence, it is often assumed
that redundant information between adjacent frames can be
used to reconstruct the current frame. We rewrite the multi-
frame upper-resolution estimator as the following minimiza-
tion:
3,5,20,21
˜
F ¼ argmin
X
t
k¼s
jjD
k
H
k
T
k
F − Y
k
jj
p
p
; (2)
where 1 ≤ p ≤ 2, each p represents an L
p
; norm estimator,
and p → 1 refers to the most robust cost function. In this
paper, the choice of parameter p is not our research priority;
for simplification,p ¼ 1 is applied. Frames s to t like a slide
window are to be used to reconstruct current frame F. The
slide window model in Ref. 11 is adopted in this paper.
3 Joint Propagation and Lucas–Kanade Image
Registration
3.1 Seed Selection for Propagation
The frames of a video are typically dynamic. The movement of
both camera and objects can cause a difference in video frame
content. Camera lens motion used to acquire two digital frames
of a flat scene can be approximately illustrated as an affine map-
ping. This apparent deformation of a plane scene is a planar
homographic transform, which is smooth. Simplified local
perspective effects for any scene area can be modeled by
a six-parameter local transform of image coordinates:
11
x
0
y
0
¼
cos θ − sin θ
sin θ cos θ
x
y
þ
Δx
Δy
: (3)
Harris and Hessian affine invarian t detectors
22,23
are
respective methods that normalize the six parameters in
the affine transform. They first detect key points in the
scale space. They then apply affine normalization to estimate
the parameters for elliptical regions. In this paper, the Harris
affine invariant detector is used for region detection. Because
SIFT matching normalizes rotations, translations, and scal-
ing, it is the only fully scale-invariant detector. Hence,
it is a suitable method for finding the initial matching of
two frames even though they are not adjacent or when the
camera is moving when shooting.
3.2 Propagation Scheme for Coarse Motion
Registration
After matching image seeds, which are the output of the
original seeds selected in Sec. 3.1, the regions around
Fig. 1 Low-resolution (LR) video observation model.
Journal of Electronic Imaging 063018-2 Nov∕Dec 2014
•
Vol. 23(6)
Liang et al.: Combining coarse and fine registration for video frame super-resolution reconstruction