Bitstream-based Model Standard for 4K/UHD:
ITU-T P.1204.3 – Model Details, Evaluation,
Analysis and Open Source Implementation
Rakesh Rao Ramachandra Rao
∗
, Steve G
¨
oring
∗
, Peter List
†
,
Werner Robitza
∗
, Bernhard Feiten
†
, Ulf W
¨
ustenhagen
†
, Alexander Raake
∗
∗
Dept. of Audio Visual Technology; Technische Universit
¨
at Ilmenau, Germany
Email: [rakesh-rao.ramachandra-rao, steve.goering, werner.robitza, alexander.raake]@tu-ilmenau.de
†
Deutsche Telekom AG, Technology & Innovation, Germany
Email: [peter.list, bernhard.feiten, ulf.wuestenhagen]@telekom.de
Abstract—With the increasing requirement of users to view
high-quality videos with a constrained bandwidth, typically real-
ized using HTTP-based adaptive streaming, it becomes more and
more important to determine the quality of the encoded videos
accurately, to assess and possibly optimize the overall streaming
quality. In this paper, we describe a bitstream-based no-reference
video quality model developed as part of the latest model-
development competition conducted by ITU-T Study Group 12
and the Video Quality Experts Group (VQEG), “P.NATS Phase
2”. It is now part of the new P.1204 series of Recommendations as
P.1204.3. It can be applied to bitstreams encoded with H.264/AVC,
HEVC and VP9, using various encoding options, including
resolution, bitrate, framerate and typical encoder settings such as
number of passes, rate control variants and speeds. The proposed
model follows an ensemble-modelling–inspired approach with
weighted parametric and machine-learning parts to efficiently
leverage the performance of both approaches. The paper provides
details about the general approach to modelling, the features used
and the final feature aggregation. The model creates per-segment
and per-second video quality scores on the 5-point Absolute
Category Rating scale, and is applicable to segments of 5–10
seconds duration. It covers both PC/TV and mobile/tablet viewing
scenarios. We outline the databases on which the model was
trained and validated as part of the competition, and perform
an additional evaluation using a total of four independently
created databases, where resolutions varied from 360p to 2160p,
and frame rates from 15–60fps, using realistic coding and
bitrate settings. We found that the model performs well on the
independent dataset, with a Pearson correlation of 0.942 and
an RMSE of 0.42. We also provide an open-source reference
implementation of the described P.1204.3 model, as well as the
multi-codec bitstream parser required to extract the input data,
which is not part of the standard.
Index Terms—bitstream model, video quality, machine learn-
ing, HTTP adaptive streaming
2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)
I. I
NTRODUCTION
With the advancement in image capture technology in both
cameras and mobile phones, the number of content producers
generating and streaming 4K content is increasing rapidly. In
addition to this, most of today’s streaming platforms such
as Netflix, YouTube, or Amazon Prime Video also stream
content in 4K to provide a more immersive experience to the
end-user. Along with the need of Internet Service Providers
(ISP) and “over-the top” (OTT) streaming providers to ensure
a high degree of satisfaction among their customers, these
developments highlight the need for efficient video quality
algorithms that can be used to assess, benchmark and possibly
optimize the overall streaming quality, and enhance the Quality
of Experience (QoE) for the end-user.
HTTP-based adaptive streaming (HAS) is the preferred
technology for streaming video content over the Internet, with
Dynamic Adaptive Streaming over HTTP (DASH) or HTTP
Live Streaming (HAS) being two popular implementations.
As a consequence, video quality algorithms need to consider
HAS-specific features such as quality switches and stalling
during playout.
In view of both these developments, namely, the pervasion
of high-quality contents and the increasing usage of HAS-
based technologies for streaming, it becomes important to
develop adequate video quality algorithms. In general, video
quality models can be distinguished in three main different
categories depending on the input data, namely, 1) media-
layer or pixel-based models, 2) bitstream-layer models, and
3) hybrid models. Here, media-layer models use the decoded
video to estimate video quality scores, bitstream-layer models
use the encoded bitstream for video quality estimation, and
hybrid models use a combination of decoded video and
encoded bitstream as input to predict video quality [15, 1].
One example of a video quality algorithm developed to
handle HAS-specific scenarios is the audiovisual QoE model
according to ITU-T Rec. P.1203. The bitstream-based model
comprises components for short-term video and short-term
audio quality prediction, and for quality integration – together
with initial loading delay and stalling information. P.1203 is
generally suitable for more accurate quality predictions based
on full bitstream access, or can be used for more light-weight
quality estimation based only on metadata, such as resolution,
framerate and bitrate [1].
In the context of the standardization work conducted in
ITU-T Study Group 12 (SG12), the bitstream-layer models
978-1-7281-5965-2/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: SUNY AT STONY BROOK. Downloaded on July 26,2020 at 14:43:01 UTC from IEEE Xplore. Restrictions apply.