2274 Wu et al.: A Fast TU Size Decision Method for HEVC RQT Coding
Fast Prediction Unit Mode Decision: Zhao et al. [14] have studied the impact of the number
of PU mode candidates after the RMD process, and proposed a fast PU mode decision
algorithm for intra prediction by reducing the number of PU mode candidates. It was reported
that on average 20% and 28% encoding time reduction can be achieved under High Efficiency
(HE) and Low Complexity (LC) test conditions, respectively, with almost the same coding
efficiency as HM 1.0. It is noted that HE and LC test conditions are used in the early phase of
HEVC development and have been merged since HM 6.0.
Several fast intra prediction mode decisions have been introduced by Zhang and Ma [15,
16]. In their works, three methods were proposed to speed up the PU mode decision: a
modified Hadamard transform, a novel progressive rough mode search and an early
termination method for rate distortion optimization quantization (RDOQ) process. It was
reported 2.5x speedup with 1.0% BD-Rate increase on HM 10.0. However, the correlations
between neighboring TUs have not yet been exploited.
Fast Transform Unit Size Decision: The RQT coding in HEVC is applied to improve the
coding efficiency, but it demands significant computational overhead [2, 3, 5-8]. Hence, Tan
et al. proposed several fast encoding schemes for both intra prediction and inter prediction for
TU size decision-making in the RQT coding [17]. It was reported that for AI case, the fast
RQT coding algorithm saved 13% encoding time with 0.1% BD-Rate increase. For RA and
LB cases, up to 9% encoding time can be reduced at the expense of 0.3% BD-Rate increase on
HM 2.0.
Another fast RQT coding scheme was proposed by Teng et al. [18], where the original
depth-first TU size decision was replaced by a Merge-and-Split decision for the RQT coding.
The Merge-and-Split decision process was terminated and no further TU splitting was
performed when current TU was a zero-block. It reported almost 2x speedup for RA case
under HE configuration on HM 2.0, with about 0.3% BD-Rate increase.
Kiho Choi et al. [19] exploited the relationship between the determined TU size and the
number of nonzero DCT coefficients to determine the TU size at an early stage. The total
misprediction ratio for the given NNZ with a threshold 3 was 1.26%, which implied that there
existed negligible coding efficiency compared to the original HEVC encoder. It showed 0.6%
BD-Rate increase as well as averaged 60% computational complexity reduction for RA case
under HE configuration on HM 3.0.
Furthermore, Zhang and Zhao [20] have developed an adaptive RQT coding algorithm for
inter prediction by restricting the smaller transform depth level for larger CU size and vice
verse, based on the observations from the RQT coding. It reported 0.7% BD-Rate increase
with 7.2%~21% computational complexity reduction under HE and LC configuration on HM
4.0, respectively. However, the proposed algorithm is for inter prediction only and intra
prediction is not concerned.
A piece of our earlier work presented a fast RQT coding algorithm based on the
experimental observations [21]. When current CU was split into four quadrants, the RQT
coding of upper-left sub-CU was performed first, followed by the individual RQT coding of
remaining three sub-CUs with predetermining their smallest TU size as that of upper-left
sub-CU. Therefore, a lot of TU size decisions can be skipped for the RQT coding. This fast
RQT coding algorithm is updated to reduce more TU size decisions in Section 3.
The aforementioned algorithms are well developed to reduce the computational
complexity for HEVC encoders. However, the TU size correlations between neighboring CUs
are not fully studied. In order to reduce the computational complexity requested by the RQT