^
X
t
ðn
,
Þ. To capture the varying property of frame contents,
in MCFI the whole frame is usually divided into a number
of blocks S, and each block has a motion vector v
,
¼ðv
x
, v
y
Þ
with the horizontal component v
x
and vertical component
v
y
, respectively. And then X
t
ðn
,
Þ can be formulated as
^
X
t
ðn
,
Þ¼w
f
P
f
ðn
,
Þþw
b
P
b
ðn
,
Þ
¼ w
f
X
t1
ðn
,
þv
,
f
Þþw
b
X
t þ 1
ðn
,
þv
,
b
Þð1Þ
where w
f
and w
b
are the relative weights of the forward
predicted block P
f
and the backward predicted block P
b
,
v
,
f
and v
,
b
represent the motion vectors in the forward
and backward reference frames. For the majority cases,
w
f
þw
b
¼ 1 and w
f
¼ w
b
¼ 1=2. More generally, v
,
f
and v
,
b
may be any fractional numbers [17]. If the motion vectors
are of sub-pixel accuracy, Eq. (1) is applied to the
corresponding references with fractional-pixel accuracy
to yield the up-sampled signals accordingly.
When a finite impulse response (FIR) filter with 2M-tap
is used for the 2-D separate interpolation, the reference
signals with motion vectors of horizontally, vertically and
diagonally half-pixel accuracy in each prediction direction
can be yielded by
Pðn
,
Þ¼
X
M
u ¼M þ 1
hðuÞX
r
ðn
x
þ v
x
bc
þu, n
y
þ v
y
Þð2Þ
Pðn
,
Þ¼
X
M
u ¼M þ 1
hðuÞX
r
ðn
x
þ v
x
bc, n
y
þ v
y
þuÞð3Þ
and
Pðn
,
Þ¼
X
M
u
1
¼M þ 1
hðu
1
Þ
X
M
u
0
¼M þ 1
hðu
0
ÞX
r
ðn
x
þ v
x
bc
þu
0
, n
y
þ v
y
þu
1
Þ
!
ð4Þ
where
bc
represents the operation rounded to the nearest
integer pixel position towards minus infinity and hðuÞ
represents the tap coefficient. The interpolated values
at the horizontal and vertical half-pixel positions are
obtained by applying a one-dimensional 2M-tap FIR
filter horizontally and vertically using Eqs. (2) and (3),
respectively. For the diagonally half-pixel position, one-
dimensional 2M-tap FIR filter needs to be performed
horizontally firstly and then vertically using Eq. (4). The
half pixels and full pixels are then utilized to interpolate
the quarter-pixels via bilinear method. Fig. 2 illustrates
the 1:2 frame rate up conversion process with horizon-
tally half-pixel accuracy in both directions when a FIR
filter h with 6-tap is used. The corresponding interpola-
tions are first used to generate the forward and backward
prediction blocks P
f
and P
b
using Eq. (2), respectively. And
then the up-sampled pixel can be yielded by Eq. (1).
2.2. Derivation of the optimal down-sampled frame
Traditional MCFI usually tries to find the most faithful
motion vectors for each block to be interpolated. Actually,
it is easy to observe from Eq. (1) that the quality of up-
sampled frames depends on not only the accuracy of
motion vectors but also the information contained in the
forward and backward reference frames. More informa-
tion about the frame to be interpolated embedded in the
forward and backward reference frames, up-sampled
frames with much higher quality can be obtained. To
transfer more information about the frame to be inter-
polated to the down-sampled frames, an up-sampling
oriented frame rate reduction is proposed in this subsec-
tion. Here, we will take MCFI [1] as an example to
describe the derivation of the optimal down-sampled
frame, and it can be easily extended to other frame rate
up conversion algorithms.
Define X
t
as the original frame in the input video at
time instance t in a vector form and the corresponding
up-sampled frame is
^
X
t
. For simplicity, we will take 1:2
MCFI as an example to derive the optimal solution of the
frame rate reduction problem. And of course, it can be
easily extended to arbitrary ratio MCFI. The goal of the
proposed frame rate reduction is to generate a high
quality interpolated frame while at the same time make
the down-sampled sequence faithful to the input one.
Consequently, the optimal down-sampled frame should
w
f
w
f
w
f
w
f
w
f
w
b
w
b
w
b
w
b
w
b
h
(
0
)
h
(
−1
)
h
(
−2
)
h
(
1
)
h
(
2
)
h
(
3
)
h
(
0
)
h
(
−1
)
h
(
−2
)
h
(
1
)
h
(
2
)
h
(
3
)
Full-pixel
sample
Fractional-pixel
sample
Upsampled-pixel
sample
t-1
t
t+1
...
...
...
...
...
...
Interpolation with
h
(
n
)
Fig. 2. 1:2 frame rate up conversion when v
,
f
¼1=2, 0
and v
,
b
¼ 1=2,0
.
Y. Zhang et al. / Signal Processing: Image Communication 28 (2013) 254–266256