64位多核平台上的HEVC DCT/IDCT SIMD加速优化

75 浏览量更新于2024-08-28 收藏 939KB PDF 举报

本文主要探讨了如何通过高效的SIMD（Single Instruction Multiple Data）技术来加速在高效率视频编码（HEVC）中的离散余弦变换（DCT）和逆离散余弦变换（IDCT）。HEVC标准旨在显著提高编码效率，但随之而来的是计算复杂度的显著增加，其中DCT和IDCT操作是编码过程中频繁执行且耗时的部分。作者Lingyu Li、Xiaoyun Zhang和Zhiyong Gao来自上海交通大学图像通信与网络工程研究所，他们针对64位多核平台提出了SIMD加速策略。他们的研究重点在于减少DCT和IDCT过程中的中间变量位宽处理，同时提升并行处理能力，从而在保持压缩效率的前提下显著降低计算复杂度。在优化方案中，他们采用低精度数据处理，减少了数据在计算过程中的存储和传输开销。实验结果显示，在64位Tile RAM多核平台上，通过提出的SIMD实现，可以将DCT和IDCT的计算复杂度分别降低大约40%到70%。这表明SIMD技术能够有效地提升HEVC编码的性能，对于大规模视频编码任务来说，这种优化具有重大的实际应用价值和商业潜力。此外，文章还可能涵盖了SIMD指令集的选择、并行化策略的设计、性能基准测试和与其他优化方法的比较等内容，以全面展示SIMD加速对HEVC DCT和IDCT操作效率提升的深度分析。整体来看，这篇研究论文为高性能视频编码技术的发展提供了新的思路和实践指导，对于提高视频编码的实时性和能效比具有重要意义。

Efficient SIMD Acceleration of DCT and IDCT for High Efficiency Video Coding

Lingyu Li, Xiaoyun Zhang, Zhiyong Gao

Institute of Image Communication and Network Engineering

Shanghai Jiao Tong University

Shanghai, China

{lilingyu, xiaoyun.zhang, zhiyong.gao}@sjtu.edu.cn

Abstract—The promising High Efficiency Video Coding

(HEVC) standard aims at much higher coding efficiency, but

the cost is the greatly increased computation complexity.

Among all the coding modules, DCT and IDCT are frequently

called and bring a lot of complexity burden. While, the Single

Instruction Multiple Data (SIMD) technique has been widely

used to speed up media data process. In this paper, we focus on

SIMD acceleration of HEVC DCT and IDCT, and the

implementation is conducted on 64-bit multicore platform. For

DCT optimization, intermediate variables are processed in less

bit width and parallel processing level is significantly improved

with little compression efficiency loss. Experiment results on

64-bit Tilera multicore platform exhibit that the proposed

SIMD implementation can greatly reduce about 40%-70%

computational complexity of DCT and IDCT with negligible

compression performance loss.

Keywords- simd; dct; idct; hevc; tilera

I. INTRODUCTION

The Joint Collaborative Team on Video Coding (JCT-VC)

has published the next generation video coding standard

referred to as High Efficiency Video Coding (HEVC).

HEVC is expected to provide 200% compression efficiency

over the current standard H.264. However, high compression

efficiency is achieved at the cost of much more

computational complexity, which has become a serious

problem for the real-time video codec [1].

In video codec, Discrete Cosine Transform (DCT) plays

a vital role in video compression. The transform tool in

HEVC is far more complicated than the H.264 standard.

HEVC can support various transform sizes ranging from 4x4

to 32x32. Besides, for the transform of 4x4 intra TU

(transform unit) of luma component, an approximation to the

discrete sine transform (DST) is applied [2]. Compared with

H.264, the coefficients of DCT transform more complicated,

which results in more multiplications instead of shifts and

additions. In addition, the intermediate variables need more

bit width, which reduces the parallel processing level.

In recent years, most modern processors provide media

instructions to improve the computational performance.

Single instruction multiple data (SIMD) is a very efficient

tool in processing media data. By exploiting the data-level

parallelism, SIMD technologies provide a series of effective

approaches for fast algorithm implementation. On Intel

platform, the MMX/SSE technology is a typical example of

SIMD [10]. On popular Intel and ARM platforms, there are

some SIMD based algorithms for HEVC[5-6]. The TILE-

Gx36 is a system-on-chip 36-core processor of Tilera family

[12]. Each of the 36 processor cores is a full-fledged 64-bit

processor. The processor instruction set architecture (ISA)

includes a rich set of SIMD instructions. Due to the low

power consumption and effective parallel processing ability,

some HEVC codec implementation work has been done on

Tilera platform [7-9].

We focus our work on the SIMD acceleration of HEVC

DCT and IDCT modules on Tilera multicore platform. The

rest of this paper is organized as follows. An introduction of

HEVC DCT and IDCT is given in Section II. Section III

presents our proposed SIMD acceleration method of HEVC

DCT and IDCT on Tilera platform. Section IV provides the

acceleration results. At last, Section V concludes the paper.

II. INTRODUCTION OF HEVC DCT AND IDCT

Similar to previous video coding standards, DCT and

IDCT modules in HEVC are used to transform the prediction

residues to eliminate spatial redundancy. Two-dimensional

DCT and IDCT are calculated by applying horizontal and

vertical one-dimensional DCT and IDCT. The DCT and

IDCT of HEVC can support different TB (transform block)

sizes: 4x4, 8x8, 16x16 and 32x32.

Figure 1. Coefficient matrix of 16x16 DCT

In H.264, the one-dimensional transform can be

implemented by matrix multiplication and dot product. First,

the multiplication of the input matrix and the kernel

transform matrix is calculated. The coefficients of kernel

transform matrix are +1, -1, +2, -2, which indicates the

computation of the matrix multiplication can be done by

addition and shift operations. Then, compute the dot product

of the kernel transform result and a scaling matrix and this

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38660327

粉丝: 8

64位多核平台上的HEVC DCT/IDCT SIMD加速优化

超高速IDCT 的原理及实现

VC++DCT变换源码

mmx.rar_IDCT_mmx_mmx idct_visual c

project-CSharp-DCT.rar_c# dct_c# transform_dct_project

dct_main.zip_DCT c++ CODE

TMS320C6000系列下mpeg-2视频压缩标准实现的源代码

H.264 协议 c 语言实现

IDCT汇编代码实现：多版本优化策略解析

C++ Builder 中的逆离散余弦变换实现

C/C++实现图像处理中的离散余弦变换技术

最新资源