LIU et al.: MOTION ESTIMATION OPTIMIZATION FOR H.264/AVC USING SOURCE IMAGE EDGE FEATURES 1097
Camera sensor
x
s
t
(x)
s'
t
(i·u
x
)
s
t – 1
(x)
Δ
x
s'
t
(i·u
x
)
e(i·u
x
)
d
x
u
x
i
i + 1 i + 2 i + 3
Δ
x
Fig. 1. Analysis of 1-D prediction error caused by edge gradient and
displacement estimation error.
where s
t
(i · u
x
) is the edge gradient of s
t
(x) at the ith
camera sensor and the displacement estimation error
x
is
a random variable with zero mean and
x
∈ [−u
x
/2, u
x
/2].
When
x
=±u
x
/2, |e(i · u
x
)| reaches its maximum value
(u
x
·|s
t
(i · u
x
)|)/2 and when
x
= 0, |e(i · u
x
)| vanishes.
This conclusion agrees with the aliasing investigation in the
spectral domain provided in the literature [14]. Equation
(2) also interprets the necessity of MRFs during prediction
processing: If the displacement error
x,t−1
between the
current image s
t
(x) and the first previous one s
t−1
(x) is larger
than that of the kth previous image s
t−k
(x), i.e.,
x,t−k
,
s
t−k
(x) is preferred to be chosen as the prediction signal
because its prediction error coming from aliasing problem is
reduced.
In order to simplify the notations in the following discus-
sions, it is assumed that the spatial sampling intervals in x-
and y-direction are u
x
= u
y
= 1. From (2), it is convenient
to derive the 2-D prediction error in one pixel
e(i, j) ≈
x
(i, j) ·
∂s
t
(i, j)
∂x
+
y
(i, j) ·
∂s
t
(i, j)
∂y
. (3)
If it is assumed that
x
(i, j) and
y
(i, j) are independent,
E(
x
) = E(
y
) = 0, and E(
2
x
) = E(
2
y
) = σ
2
, the variance
of e(i, j), i.e., σ(i, j), is written as
σ
2
(i, j) = σ
2
∂s
t
(i, j)
∂x
2
+
∂s
t
(i, j)
∂y
2
. (4)
Using the prediction error variance of one pixel (4), the
prediction error power of an image block can be deduced as
i, j
σ
2
(i, j) = σ
2
i, j
∂s
t
(i, j)
∂x
2
+
∂s
t
(i, j)
∂y
2
(5)
where (i, j ) ∈ block.
Like the spectral analysis represented by (1), (5) also
indicates that the prediction error power is determined by
the image features and the displacement estimation error.
Additionally, the spatial analysis illustrates that the power
of the block prediction error is proportional to the sum of
squares of the edge gradient amplitudes. This conclusion plays
an important role in the proposed early termination threshold
definition described in Section IV.
Optimum forward channel
+
+
+
+
+
S
t
(u,v) E
t
(u,v)
G(u,v)
F(u,v)
N(u,v)
E
t
(u,v)
S
t
(u,v)
S
t–1
(u,v)
Fig. 2. Model of hybrid coder with the optimum forward channel, G (u,v) =
max[0, 1 − (/(S
ee
(u,v)))] and the power spectral density of N (u,v) is
S
nn
(u,v) = max[0,(1 − (/(S
ee
(u,v)))].
Equation (3) yields two important conclusions.
1) According to the terms of displacement error |
x
|
and |
y
|, the impact of aliasing vanishes at full pixel
displacements and is at its maximum at half pixel
displacements.
2) Because of the terms of edge gradient
(
∂s
t
(i, j)/∂ x,∂s
t
(i, j)/∂y
)
, aliasing is caused by
high-frequency signals in the source image.
In practice, a picture that is rich in sharp edges must con-
tain numerous high-frequency signals. In the literature [22],
for 2-D spatial signal s(x, y),
(
∂s(x, y)/∂ x,∂s(x, y)/∂ y
)
is
defined as the local spatial frequency, which is introduced to
describe the local frequency feature in a region. The spatial
edge gradient analysis is superior to the spectral analysis
because it can efficiently reveal the local frequency nature of
the image with trivial computational overhead. Therefore, as
we shall see in Section III, when the image block contains
numerous textures, the power of its prediction errors becomes
augmented, which requires advanced coding approaches, such
as VBS and MRF techniques. Otherwise, the redundant com-
putation can be discarded with negligible coding quality
degradation. This is the essence of our homogeneity-based fast
algorithms.
III. H
OMOGENEITY-BASED REFERENCE FRAME AND
INTERMODE REDUCTION
Using rate-distortion theory, the relative homogeneity con-
cept is developed in Section III-A. Based on the relative
homogeneous block detection algorithm, the futile reference
frames and intermodes could be eliminated efficiently, which
is described in Section III-B.
A. Relative Homogeneous Block Detection Algorithm
Based on the hybrid coder model with the optimum forward
channel, as shown in Fig. 2, it is convenient to develop the
relative homogeneity concept. Capital letters, for example
S
t
(u,v), represent the discrete 2-D Fourier transforms of
the corresponding spatial signals. Let S
t
(u,v) denote the
N × N small image block to be encoded through the hybrid
coder and
S
t−1
(u,v) is the prediction signals generated from
the previously decoded image signals by the low-pass filter
F(u,v). The optimum forward channel consists of a nonideal
band-limiting filter G(u,v) and an additional noise N (u,v).
With rate-distortion theory [23], the distortion D and the
Authorized licensed use limited to: China Three Gorges University IEL Trial. Downloaded on November 3, 2009 at 22:27 from IEEE Xplore. Restrictions apply.