深度学习驱动：卷积编码器-解码器网络提升视频帧内预测效率

研究论文

需积分: 16 44 浏览量更新于2024-08-13 收藏 2.5MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

本文探讨了在视频编码领域的一项创新技术，即使用卷积编码器解码器（Convolutional Encoder-Decoder，CED）网络进行视频帧内预测。帧内预测是高效视频编码（High-Efficiency Video Coding, HEVC）中的一项关键技术，其目的是减少视频内容的时空冗余，从而降低编码复杂度和带宽需求。传统的帧内预测方法主要依赖于周围已编码像素的线性或非线性插值，但这往往无法保证对复杂纹理结构的精确预测，尤其是在遇到弱空间一致性区域时，效率会受到影响。卷积编码器解码器网络作为一种深度学习模型，通过学习数据中的内在模式和特征，能够更有效地捕捉图像中的纹理特征和局部结构。该网络通常包含编码阶段，通过卷积层和池化层逐渐降低输入数据的维度，提取关键特征；解码阶段则通过反卷积操作逐步恢复原始图像细节，同时利用先前编码的信息进行预测。这种方法能够增强对图像内容的理解，特别是对于边缘、纹理等复杂元素的预测，从而提高预测精度，提升编码效率。研究者们针对这一挑战，将卷积编码器解码器应用于视频帧内预测任务，旨在开发出一种更为智能和自适应的预测算法。他们在Shanghai University和Jiaxing Vocational and Technical College的研究团队合作，通过实验对比传统方法与CED网络，分析了其在HEVC编码中的性能提升，包括编码速度、压缩比以及视觉质量等方面。文章的主要贡献可能包括： 1. 提出了一种基于卷积神经网络架构的视频帧内预测模型，能够更好地处理复杂纹理结构。 2. 阐述了CED网络在视频编码中的训练策略和优化方法，以确保模型在实际应用中的稳定性和有效性。 3. 实验结果展示了CED网络在HEVC中的优势，尤其是在提升预测准确性和编码效率方面的性能提升。 4. 对未来可能的研究方向和应用场景进行了讨论，例如在实时视频流传输、低延迟编码或者基于机器学习的自适应编码等。这篇研究论文对于理解如何利用深度学习技术改进现有的视频编码标准具有重要意义，同时也为其他领域的图像处理，如图像修复（image inpainting）提供了新的思考角度。在未来的工作中，可能会进一步探索如何结合深度学习和其他编码技术，以实现更高的编码效率和更好的压缩性能。

资源详情

资源推荐

ARTICLE IN PRESS

JID: NEUCOM [m5G; December 10, 2019;11:50 ]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Video intra prediction using convolutional encoder decoder network

Zhipeng Jin

∗

, Ping An

∗

, Liquan Shen

School of Communication and Information Engineering, Shanghai University, Shanghai 2004 4 4, China

Jiaxing Vocational and Technical College, Jiaxing 314036, China

a r t i c l e i n f o

Article history:

Received 11 April 2018

Revised 8 December 2018

Accepted 1 February 2019

Available online xxx

Keywords:

Video coding

Intra prediction

Image inpainting

Convolutional encoder-decoder network

(CED)

High Eﬃciency Video Coding (HEVC)

a b s t r a c t

Intra prediction is an effective method for video coding to remove the spatial redundancy of content.

Classical intra prediction method usually creates a prediction block by extrapolating the encoded pixels

surrounding the target block. However, existing methods cannot guarantee the prediction eﬃciency for

rich textural structure, especially when weak spatial correlation exists between the target block and refer-

ence pixels. To remedy this issue, this paper proposes a novel intra prediction method via convolutional

encoder-decoder network, which we term IPCED. IPCED can learn and extract the internal representa-

tion of reference blocks, and progressively generate a prediction block from this representation. IPCED is

a data-driven method, which represents an improvement over hand-crafted methods, and is capable of

improving the accuracy of intra prediction. Extensive experimental results demonstrate that IPCED can

generate higher-quality intra prediction results, achieves 3.41%, 3.07% and 3.44% bitrate saving for the

Y/Cb/Cr channel compared with HEVC baseline, which is signiﬁcantly beyond existing methods.

Introduction

Intra prediction methods play an important role in current

state-of-the-art video coding standards [1] , as they provide an

eﬃcient solution to reduce signal energy by prediction from

spatial neighboring encoded pixels. In order to capture ﬁner edge

directions presented in natural images, High Eﬃciency Video

Coding (HEVC) employs 35 intra prediction modes, which include

planar mode, DC mode, and 33 angular prediction modes [2] .

Furthermore, in the developing Joint Exploration Model (JEM) [3] ,

the number of angular prediction modes has been extended to

65. This kind of ﬁne-grained modes can provide more accurate

prediction when compared with the intra prediction in H.264/AVC,

in which there are only 9 modes [4] .

Video intra prediction is a well-studied and challenging task,

and its classical method is to create a prediction block by extrapo-

lating the reference pixels surrounding the target block, as shown

in Fig. 1 . For angular prediction, each pixel in the current block

will be projected to the nearest reference line along the angular

direction, and the projected pixel is used as the prediction. A linear

This work was supported in part by the National Natural Science Foundation of

China under Grants 61571285 and 61801006 , and Shanghai Science and Technology

Commission under Grant 17DZ2292400 and 18XD1423900 , and Zhejiang Provincial

Natural Science Foundation of China under Grant No. LGF20F020 0 03.

∗

Corresponding authors.

E-mail addresses: 364043283@qq.com (Z. Jin), anping@shu.edu.cn (P. An),

jsslq@shu.edu.cn (L. Shen).

interpolation ﬁlter with 1/32 pixel accuracy is used to generate the

reference line. And, the ﬁlter coeﬃcient is the inverse proportion of

the two distances between the projected fraction position and its

two adjacent integer positions. In essence, the angular prediction

in HEVC is a copying based process with the assumption that im-

age content follows a pure direction of propagation. Besides, for DC

mode, the prediction is the average of all the reference pixels. For

planar mode, a bi-linear interpolation is used to create a predic-

tion block. However, all these modes together are still too simple

to fully characterize the complex non-linear relationship between

the reference pixels and the target block.

There are many works to further improve intra prediction ef-

ﬁciency. Kamisli et al. [5,6] models the correlation between adja-

cent pixels as a ﬁrst order 2D Markov process, where each pixel

is predicted by linearly weighing several adjacent pixels. Lai et al.

[7] propose an error diffused intra prediction algorithm for HEVC.

In addition, Chen et al. [8] incorporating ordered dither technique

into intra prediction instead of error diffusion, to reduce compu-

tational complexity. Chen et al. [9] propose a copying-based im-

proving intra prediction method. Lucas et al. [10] propose a intra

prediction framework based on adaptive linear ﬁlters with sparsity

constraints. Dias et al. [11] propose an improved combined intra

prediction (CIP) method, which both use the reference pixels and

the prediction pixels generated by the intra prediction modes. Li

et al. [12] propose a piece-wise linear projection method based on

canonical correlation analysis (CCA), to better exploit the local spa-

tial correlations. However, these aforementioned works are single

https://doi.org/10.1016/j.neucom.2019.02.064

Please cite this article as: Z. Jin, P. An and L. Shen, Video intra prediction using convolutional encoder decoder network, Neurocomputing,

https://doi.org/10.1016/j.neucom.2019.02.064

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38636763

粉丝: 8
资源: 961

深度学习驱动：卷积编码器-解码器网络提升视频帧内预测效率

CEDN-Keras:具有全卷积编码器-解码器网络（Tensorflow2）的对象轮廓检测

卷积编码解码代码matlab

自编码器是什么结构？卷积自编码器是什么结构？卷积自编码器与自编码器的区别？

编码器-解码器架构的卷积神经网络

卷积自编码器长短期记忆网络

图神经网络编码器解码器

编码器解码器生成器神经网络

Simulink卷积解码

卷积编码器的输出波形图像的特征

神经网络的解码器和编码器

详细介绍卷积编码器的原理和其在输入与输出上的一些特点

一维卷积去噪自编码器

把软阈值函数用于卷积自编码器时，应用于编码器部分还是解码器部分为什么

卷积自动编码器进行特征提取

simulink 卷积编码器

卷积对抗性自动编码器

matlab 卷积码编解码

（2，1，9）卷积编码器与（2，1，7）卷积编码在具体代码的实现上有什么区别

卷积自编码器特征提取

自编码器和卷积神经网络的区别

最新资源