深度学习驱动的桌面内容编码快速预测技术

141 浏览量更新于2024-06-18 收藏 7.57MB PDF 举报

本文主要探讨了"基于深度学习的桌面内容编码快速预测"（DeepSCC），这是一种深度学习技术在屏幕内容编码（Screen Content Coding，SCC）领域的创新应用。SCC是高效视频编码（High Efficiency Video Coding，HEVC）的扩展，旨在通过引入两个新的编码模式，即Intra Block Copy (IBC) 和 Palette (PLT)，来提高屏幕内容视频的编码效率。然而，HEVC采用的灵活四叉树结构的编码树单元（Coding Tree Unit，CTU）划分以及众多的模式候选者，使得SCC的快速算法设计面临严峻挑战。传统的SCC编码过程中，为了减少计算量和提高编码速度，需要快速而准确地预测视频帧内不同区域的像素值和纹理特性。深度学习作为一种强大的机器学习工具，能够通过学习大量的图像数据，自动提取特征并进行高效的预测。DeepSCC网络的设计目标就是在不牺牲编码质量的前提下，利用深度神经网络模型（如卷积神经网络、循环神经网络等）来预测CTU内的像素模式，从而加速编码决策过程。具体来说，DeepSCC网络可能包含以下几个关键部分： 1. **输入处理**：视频帧被分解成多个小的CTU，每个CTU作为一个输入向量传递给深度学习模型。这些输入通常包括先前的编码信息、邻域像素和潜在的模式信息。 2. **特征提取**：深度学习层（如卷积层）负责识别和抽象出CTU中的模式特征，这些特征有助于区分不同的编码模式。 3. **模式分类**：经过一系列前馈和反馈处理后，模型会对每个CTU进行分类，选择最合适的编码模式，如Intra预测或Palette预测。 4. **快速决策**：由于深度学习模型的高效性，它可以实时做出编码决策，显著降低编码时间，同时保持与HEVC相当甚至更好的编码效率。 5. **迭代优化**：深度学习模型可能会经过训练和调整，以适应各种类型的屏幕内容，进一步提升预测精度。 6. **编码性能评估**：论文中提到的"Computational Complexity Reduction"部分很可能涉及对模型效率和编码性能的评估，确保深度学习方法在实际应用中具有可行性。总结起来，这篇文章深入研究了如何将深度学习技术应用于SCC，以解决其传统编码模式带来的复杂性问题，通过构建一个智能的预测网络，为屏幕内容视频编码提供了潜在的革新解决方案。随着深度学习的不断发展，这种快速预测方法有望成为未来视频编码领域的关键技术。

1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2929317, IEEE

Transactions on Circuits and Systems for Video Technology

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

CU without analyzing the actual CU content, the proposed

DeepSCC jointly analyzes the optimal mode maps of the

collocated CTU and the content of the current CTU to avoid

error propagation. 4) The proposed DeepSCC contains many

trainable parameters and learns extensive features, so that it

directly performs the mode decision for Intra, IBC, and PLT

rather than the simple CU type classification in [20], [22]–[24].

As a result, the decision for IBC and PLT modes can be

different, and many SCBs only check one mode from IBC and

PLT to further reduce the computational complexity.

The rest of this paper is organized as follows. Section II

presents the review and analysis of intra prediction in SCC.

Section III presents the proposed fast network DeepSCC. The

experimental results are presented in Section IV to verify the

performance of the proposed DeepSCC. Finally, Section V

concludes the paper.

II. REVIEW AND ANALYSIS OF INTRA PREDICTION IN SCC

A. Review on Intra Prediction in SCC

A CTU is a basic processing unit in SCC. To find the optimal

CTU coding structure, a CTU is recursively partitioned into

CUs in four different depth levels, i.e., depth level d∈{0,1,2,3}.

As shown in Fig. 1, a CTU of 64×64 pixels is partitioned into

four CUs of 32×32 pixels, and then each CU of 32×32 pixels

is further partitioned into four smaller CUs, until CUs of 8×8

pixels are reached. Therefore, a CTU contains 85 CU partitions

(1 + 4 + 16 + 64). In each CU, an exhaustive mode search is

performed to find its sub-optimal mode, as shown in Fig. 2.

Besides the Intra mode in HEVC that is used to encode the

traditional NIBs, SCC additionally adopts two new modes, IBC

and PLT, to improve the coding efficiency of SCBs. IBC mode

is developed based on the observation that there are many

repeated patterns for SCBs in the same frame. When encoding

the current CU, IBC searches in the reconstructed region of the

current frame to find the best-matched block for it, and the

location of the best-matched block is denoted by a block vector.

PLT mode is developed based on the observation that a SCB

usually contains the limited number of distinct colors. PLT

predicts a palette table based on the previously coded CUs,

which contains several representative sample values. Then, an

index map is sent to the decoder to denote the position of each

representative sample value in a CU. In the exhaustive mode

search, a Lagrange RD cost J

is calculated for a mode x

= D

+  × R

(1)

where x∈{Intra, IBC, PLT},  is a Lagrange multiplier, D

and

are the distortion and bit cost of the CU coded with a mode

x. The sub-optimal mode for a CU is selected as the one with

the smallest value of J

. After calculating the RD cost J

, the

optimal CTU coding structure is selected as the one with the

smallest value of the total RD cost. Then the corresponding sub-

optimal modes of those CUs become their optimal modes, and

they are involved in the final encoding bitstream.

As shown in Fig. 1, a CTU contains 85 CU partitions, and

each CU needs to check three mode candidates, except that CUs

only check IBC and Intra modes in the depth level of 0.

Therefore, the RD cost J

is calculated for 254 mode candidates

in a CTU (1×2 + 84×3). Although the hierarchical CTU

partitioning structure and the exhaustive mode search achieve

the best coding performance, it brings significant computational

burden to a SCC encoder. Since only parts of those modes are

involved in the final encoding bitstream, which are from 1 to

64, precise prediction of the optimal modes in a CTU leads to

great encoding time reduction.

B. Analysis of Intra Prediction in SCC and Motivation of

DeepSCC

To analyze the intra prediction in SCC, experiments were

performed for sequences in YUV 4:4:4 format based on the

HEVC-SCC reference software, Screen Content Model version

8.3 (SCM-8.3) [25]. The testing sequences were selected by the

experts in the JCT-VC group, and they were encoded with

quantization parameters (QPs) of 22, 27, 32, and 37 using SCM-

8.3 under All Intra (AI) configuration defined in the common

test conditions (CTC) [26]. Those sequences are classified into

four categories according to their content: text and graphics with

motion (TGM), mixed content (M), animation (A) and camera-

captured content (CC). Fig. 3 shows the examples of testing

sequences in four categories. Since sequences in TGM and M

show mixed content of NIBs and SCBs, while sequences in A

and CC only contain NIBs, we will show the average results for

sequences in TGM+M and A+CC in the following sections.

Table I shows the mode distribution of each sequence, which

is calculated as the percentages of Intra, IBC, and PLT coded

areas in a sequence. Since sequences in A+CC only contain

NIBs, it is observed that 97.46% areas of them are encoded by

Intra mode on average. Therefore, the CU type classification in

[20], [22]–[24] is efficient for NIBs by skipping both IBC and

PLT modes. However, it is observed that the mode distributions

of sequences in TGM+M are much more complicated, where

all modes take up large percentages. Even although

“ChineseEditing”, “Console”, “Desktop” and “FlyingGraphics”

only contain SCBs, Intra mode still takes up 10.06%-14.56% in

those sequences. Besides, IBC and PLT modes are not evenly

distributed. For example, IBC mode takes up 70.93% while

PLT mode only takes up 16.72% in “FlyingGraphics”.

Comparatively, SCBs in “Map” are more likely to select PLT

Fig. 2. Exhaustive mode search in a CU.

MissionControlClip3 (M) Desktop (TGM)

Robot (A) Kimono1 (CC)

Fig. 3. Examples of testing sequences in four categories.

剩余14页未读，继续阅读

码流怪侠

粉丝: 2w+
资源: 424

深度学习驱动的桌面内容编码快速预测技术

深度学习应用于预测性维护用例_Jupyter Notebook_Python_源码_下载.zip

基于深度学习的用量预测代码分享

D:\桌面\视屏编码书籍资料

基于C#手写数字识别

基于c语言实现的象棋源码

深度学习优化轨道交通客流预测系统

基于深度学习的路径推荐算法实现与研究

【R语言深度学习实践指南】：Keras与RStudio结合，快速入门深度学习

深度学习诞生记：人工智能突破性进展的幕后英雄

【深度学习快人一步】：使用Anaconda轻松部署TensorFlow和PyTorch

最新资源