IEEE TRANSACTIONS ON BROADCASTING, VOL. 65, NO. 4, DECEMBER 2019 777
A Novel Objective Quality Assessment Method for Transcoded Videos
From H.264/AVC to H.265/HEVC Utilizing Probability Theory
Xiwu Shang , Haiwu Zhao, Guozhong Wang, Xiaoli Zhao, and Yifan Zuo
Abstract—The latest video coding standard H.265/HEVC is developed
to succeed the previous coding standard H.264/AVC. However, a large
amount of legacy content was coded with H.246/AVC. Therefore,
transcoding from H.264/AVC to H.265/HEVC format is required. During
the process of transcoding, we can easily calculate the distortion of the
transcoded video with respect to the H.264/AVC-decoded video. However,
since the original video is usually unavailable, the distortion between the
original video and the transcoded video is unknown, which makes it dif-
ficult to control the coding quality of the transcoded video compared to
the original video. In this paper, we propose a novel and accurate qual-
ity estimation method for transcoded videos utilizing probability theory.
Experimental results demonstrate that the predicted quality of transcoded
videos approximate the true value, with an average error of 0.28 dB,
0.41 dB, and 0.46 dB for Y, Cb, and Cr components, respectively.
Index Terms—
Video quality assessment, PSNR, probability
theory.
I. I
NTRODUCTION
T
HE H.264/AVC standard [1] has been a widely used video
coding standard in practical application such as online video
streaming, broadcast over satellite, applications over cable or wire-
less networks. However, with the popularity of high definition (HD)
or even ultra high definition (UHD), a more efficient video coding
standard is urgently required. Therefore, in 2010, the joint collabora-
tive team on video coding (JCT-VC) was established to develop the
next coding standard. In January 2013, the high efficiency video cod-
ing standard H.265/HEVC was formally finalized roughly doubling
the compression performance compared with H.264/AVC [1], [2].
Therefore, it is expected that H.265/HEVC will gradually replace
H.264/AVC in the near future. Considering the existence of numerous
H.264/AVC-coded legacy contents and the high coding performance
of H.265/HEVC, a transcoder [3], [4] which can convert H.264/AVC
bitstreams into H.265/HEVC bitstreams is required in many applica-
tions.
Generally, transcoders can convert one compressed video for-
mat into another one, which includes encoding syntax, frame rate,
bitrate, and spatial resolution [4]. In this paper, we focus on the
research of the transcoder converting from H.264/AVC bitstream into
H.265/HEVC bitstream. A naive transcoder in the dashed box is
shown in Fig. 1, which is composed of a decoder and a cascaded
Manuscript received April 2, 2019; revised June 28, 2019; accepted July
29, 2019. Date of publication August 20, 2019; date of current version
December 10, 2019. This work was supported in part by the National
Science Foundation of China under Grant 61601296, and in part by the
Start-Up Research Project of SUES under Grant 0232-E3-0507-19-05106.
(Corresponding author: Guozhong Wang.)
X. Shang, G. Wang, and X. Zhao are with the School of Electronic
and Electrical Engineering, Shanghai University of Engineering Science,
Shanghai 201620, China (e-mail: dxsxw@126.com; wanggz@sues.edu.cn;
evawhy@163.com).
H. Zhao is with the School of Communication and Information
Engineering, Shanghai University, Shanghai 200444, China (e-mail:
zhaohaiwu@i.shu.edu.cn).
Y. Zuo is with the School of Information Technology, Jiangxi
University of Finance and Economics, Nanchang 330013, China (e-mail:
kenny0410@sina.com).
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TBC.2019.2932286
Fig. 1. The process of transcoding.
encoder. The input bitstream is decoded by H.264/AVC decoder first.
Then by H.265/HEVC encoder, the reconstructed video is coded into
H.265/HEVC bitstream. To provide a satisfactory video quality for
the transcoded video, models of video quality assessment are required
to monitor the quality of service (QoS).
Traditional video quality assessment schemes are classified into
three categories based on the availability of the original videos: full-
reference (FR) schemes, reduced-reference (RR) schemes, and no-
reference schemes (NR).
FR schemes assume that the original video is available by which
the impared video quality is measured. The peak signal to noise ratio
(PSNR) and structure similarity (SSIM) [5], [6] are examples of such
schemes. PSNR and SSIM only measure the distortion of one chan-
nel instead of the distortion of the three channels (YCbCr). In our
previous works [7], [8], we provide a color-sensitivity-based com-
bined PSNR (CSPSNR) method based on the sensitivity of the three
channels to measure the quality of the entire sequence.
RR scheme evaluates the video quality by referencing part of the
original data. In [9], a RR quality assessment algorithm is proposed
by reorganizing DCT coefficients into three subbands, each of which
is modeled as generalized Gaussian density (GGD). The parameters
of the GGD of the original picture are sent to the decoder side to
analyze the distortion. Wang et al. [10] assume that the image qual-
ity is closely related to the amount of uncertainy [11] and primary
visual information [12] according to the internal generative mech-
anism (IGM). Then they develop a RR method for screen content
image by comparing the similarities of the two components.
NR schemes measure the quality blindly without using the original
videos, which is more practical in real application for the undis-
torted reference signal is always unavailable. Currently, several blind
video quality assessment algorithms are proposed. Mittal et al. [13]
proposed an algorithm of Blind/Referenceless Image Spatial QUality
Evaluator (BRISQUE) in spatial domain, which focuses on ana-
lyzing the statistical characteristics of locally normalized lumi-
nance coefficients to quantify the distortion of natural images.
Shim et al. [14] estimate the PSNR value at the decoder side of
H.264/AVC by approximating the distribution of DCT coefficients
as Cauthy distribution. To calculate PSNR is equivalent to calcu-
late the Mean Squared Error (MSE). Then the MSE in the spatial
domain is calculated in the transform domain according to Parseval’s
Theorem. Methods [15]–[18] are all developed in a similar way. The
previous methods focus on estimating the PSNR due to quantization
0018-9316
c
2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.