深度学习驱动的自适应视频传输框架提升用户体验

需积分: 13 130 浏览量更新于2024-07-18 收藏 2.39MB PDF 举报

"本文档探讨的主题是'神经自适应内容感知互联网视频传输'(Neural Adaptive Content-aware Internet Video Delivery)，这是一项创新的视频编解码与传输技术。作者Hyunho Yeo、Youngmok Jung、Jaehong Kim、Jinwoo Shin和Dongsu Han来自韩国高级科学技术研究院(KAIST)。论文的背景是随着互联网视频流媒体在过去的几十年里迅速增长，视频质量的提供却严重依赖于网络带宽资源。这导致在网络条件不佳时，用户的体验质量(QoE)不可避免地受到影响。论文提出了一种新的视频交付框架，它利用客户端计算和深度神经网络(DNNs)技术来减少对高画质视频传输的高度依赖。通过DNN的应用，该系统能够独立于可用带宽提升视频质量，从而提高用户体验。设计的系统旨在克服诸如客户端多样性、与比特率自适应策略的交互以及DNN模型的传输等实际挑战。在实际的3G和宽带网络测试中，与现有技术相比，提议的系统表现出了显著的优势。在保持相同带宽预算的情况下，它提高了平均用户QoE的43.08%，或者在提供相同用户体验的同时，节省了17.13%的带宽。这一突破性的成果对于提升互联网视频服务的质量和效率具有重要意义，展示了深度学习如何在现代通信领域发挥关键作用，尤其是在动态网络环境中优化用户体验。"

SSIM = 1

(a) Original (1080p)

0.94

0.93

0.91

(b) Content-aware DNN

0.88

0.84

0.89

0.86

0.87

(d) 240p

Figure 2: 240p to 1080p super-resolution results

(Content type – 1st row: Game [15], 2nd row: Entertainment [14], 3rd row: News [16])

processing used in Facebook. ExCamera [31] uses mas-

sive parallelism to enable interactive and collaborative

editing. They focus on solving distributed system prob-

lems within a datacenter without changing the clients,

whereas we focus on the division of work between the

servers and clients.

Studies on video control plane

[32, 41, 44, 51] identify

spatial and temporal diversity of CDNs in performance

and advocate for an Internet-scale control plane which

coordinates client behaviors to collectively optimize user

QoE. Although they control client behaviors, they do not

utilize client computation to directly enhance the video

quality.

4 Key Design Choices

Achieving our goal requires redesigning major compo-

nents of video delivery. This section describes the key

design choices we make to overcome practical challenges.

4.1 Content-aware DNN

Key challenge.

Developing a universal DNN model that

works well across all Internet video is impractical be-

cause the number of video episodes is almost inﬁnite.

A single DNN of ﬁnite capacity, in principle, may not

be expressive enough to capture all of them. Note, a

fundamental trade-off exists between generalization and

specialization for any machine learning approach (i.e.,

as the model coverage becomes larger, its performance

degrades), which is referred to as the ‘no free lunch’ the-

orem [75]. Even worse, one can generate ‘adversarial’

new videos of arbitrarily low quality, given any existing

DNN model [38, 58], making the service vulnerable to

reduction of quality attacks.

NAS’ content-aware model.

To tackle the challenge, we

consider a content-aware DNN model in which we use a

Start

End

Large timescale redundancy

Short timescales

: Intra-frame coding

: Inter-frame coding

Group of Pictures (GOP)

H.26x, VPx

Input

Output

Update

Target

Recover high-quality redundancy

(e.g., Super-resolution)

DNN

Content-aware DNN

Figure 3: Content-aware DNN based video encoding

different DNN for each video episode. This is attractive

because DNNs typically achieve near-zero training error,

but the testing error is often much higher (i.e., over-ﬁtting

occurs) [67]. Although the deep learning community

has made extensive efforts to reduce the gap [40, 67],

relying on the DNN’s testing accuracy may result in un-

predictable performance [38, 58].

NAS

exploits DNN’s

inherent overﬁtting property to guarantee reliable and

superior performance.

Figure 2 shows the super-resolution results of our

content-aware DNN and a content-agnostic DNN

trained on standard benchmark images (NTIRE 2017

dataset [19]). We use 240p images as input (d) to the

super-resolution DNNs to produce output (b) or (c). The

images are snapshots of video clips from YouTube. The

generic, universal model fails to achieve high quality con-

sistently over a variety of contents—in certain cases, the

quality degrades after processing. In

5.1, we show how

to design a content-aware DNN for adaptive streaming.

The content-aware approach can be seen as a type of

video compression as illustrated in Figure 3. The content-

aware DNN captures redundancy that occurs at large time

scales (e.g. multiple GOPs) and operates over the entire

video. In contrast, the conventional codecs deals with re-

648 13th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

剩余16页未读，继续阅读

An_27

粉丝: 31

深度学习驱动的自适应视频传输框架提升用户体验

FlexPaper1.4

论文翻译（详见描述）

paper

RDKit_Paper:描述RDKit的简短出版物

paper-badge:元素的材料设计状态描述符

Paper Gestalt

ISAR paper

paper search

flash paper

paper pass

最新资源