自适应重编码CORDIC实现低功耗高PSNR FPGA DCT/IDCT架构

需积分: 9 201 浏览量更新于2024-08-13 收藏 258KB PDF 举报

"该文提出了一种基于自适应重编码CORDIC（Adaptive Recoding Coordinate Rotation Digital Computer，ARC）的低功耗、高PSNR（峰值信噪比）离散余弦变换（DCT）/逆离散余弦变换（IDCT）架构，适用于图像和视频压缩标准。通过使用两种类型的ARC旋转器以及高效的加法器和移位器为基础的标度因子近似方法，实现了DCT和IDCT的统一架构。该架构在FPGA上进行了验证，以确认其功能并评估性能。" 文章详细介绍了基于自适应重编码CORDIC的DCT/IDCT架构设计，这一方法主要针对图像和视频处理领域，其中DCT和IDCT是关键的信号处理技术，广泛应用于数据压缩算法，如JPEG和MPEG等标准。CORDIC算法是一种简单且计算效率高的数字信号处理技术，它利用坐标旋转来执行各种数学运算，包括乘法和除法。文章作者Jianfeng Zhang等人提出的新颖架构，旨在降低功耗的同时提高PSNR，这是衡量图像质量的重要指标。他们设计了两种不同类型的ARC旋转器，以适应DCT和IDCT的不同需求。这些旋转器优化了CORDIC算法，减少了所需的计算步骤，从而降低了能量消耗。在该架构中，一个重要的创新是采用加法器和移位器实现的标度因子近似方法。传统方法通常需要浮点或固定点乘法来处理标度因子，这会增加功耗和硬件复杂性。而新的设计通过精确的近似，能够在保持高精度的同时，减少计算资源和功耗。为了验证所提出的DCT/IDCT架构，作者将其在FPGA（Field-Programmable Gate Array）平台上实现了。FPGA是一种可编程逻辑器件，适合快速原型验证和高性能硬件实现。通过FPGA验证，他们能够评估其实时性能、功耗以及与现有解决方案相比的效率。这项研究提供了一个低功耗、高PSNR的DCT/IDCT解决方案，这对于便携式和嵌入式设备尤其重要，因为这些设备通常对电源限制有严格要求。此外，通过使用自适应重编码CORDIC，该架构有可能在不牺牲性能的情况下，实现更紧凑和节能的图像处理硬件设计。

FPGA Implementation of Low-Power and

High-PSNR DCT/IDCT Architecture based on

Adaptive Recoding CORDIC

Jianfeng Zhang

State Key Laboratory

of High Performance Computing

College of Computer

National University of Defense Techonology

Changsha, China

Email: jianfengzhang@nudt.edu.cn

Paul Chow

Department of Electrical and

Computer Engineering

University of Toronto

Toronto, Canada

Email: pc@eecg.toronto.edu

Hengzhu Liu

State Key Laboratory

of High Performance Computing

College of Computer

National University of Defense Techonology

Changsha, China

Email: hengzhuliu@nudt.edu.cn

Abstract—The discrete cosine transform (DCT) and its inverse

(IDCT) are widely used in image and video compression stan-

dards. In this paper, we propose a novel uniﬁed architecture

for DCT and IDCT based on adaptive recoding coordinate

rotation digital computer (ARC). The proposed architecture

requires two types of ARC rotators. In addition, an efﬁcient

adder and shifter-based scale factor approximation is used in

the proposed architecture. To verify the function and evaluate

the performance, the proposed architecture is validated on a

Virtex 5 FPGA development platform. Under DCT-only mode,

compared with the proposed architecture, a state-of-the-art DCT

architecture uses 12% more hardware resources, increases the

critical path delay by 7.12%, consumes 10.1% more power

and decreases 4.8 dB in PSNR. Under DCT/IDCT mode, the

latest uniﬁed DCT/IDCT architecture has a factor of 2.17-fold

in latency, needs 74.9% more hardware resources and dissipates

52.5% more power when compared to the proposed architecture.

In addition, PSNR of the proposed architecture is better by 2 dB.

I. INTRODUCTION

Today low-power is extremely important in embedded

systems, especially for portable devices. Due to the perfect

energy packing [1] and very close approximation to the opti-

mal Karhunen-Loeve transform (KLT) [2], the discrete cosine

transform (DCT) and inverse discrete cosine transform (IDCT)

have been widely applied in image and video compression

standards, such as JPEG [3], MPEG [4], H.264 [5] and

HEVC [6] since they were ﬁrst introduced [7]. As DCT

and IDCT are computationally intensive transforms, many

fast algorithms are proposed to accelerate the computation

process, such as multiplier-based algorithms [8, 9], distributed

arithmetic (DA) based algorithms [10, 11] and coordinate

rotation digital computer (CORDIC) based algorithms [12–16].

The multiplier-based algorithms and the DA-based algorithms

have high peak signal-to-noise ratio (PSNR), but they con-

sume too much power. The reason is that they either require

complicated multipliers or use too many hardware resources.

CORDIC [17] can realize the transcendental functions in a

Jianfeng Zhang is currently a visiting PhD student at the University of

Toronto.

parallel way by only using adders and shifters, and it is also

highly suited to implementation on FPGAs [18], which means

adopting CORDIC to implement DCT and IDCT can reduce

architecture complexity and save power. Compared to the other

two methods, more and more people focus on implementing

DCT and IDCT based on CORDIC.

As a 2-D DCT is commonly calculated by ﬁrst applying a

1-D DCT over the rows followed by another 1-D DCT applied

to the columns of the input matrix [16], 1-D DCT is the kernel

processing element. Meanwhile, both DCT and IDCT are used

in image and video systems, and then designing a uniﬁed

efﬁcient architecture for 1-D DCT and IDCT is very important.

In this paper, we propose a novel uniﬁed architecture for DCT

and IDCT based on CORDIC. The proposed architecture uses

two different types of CORDIC rotators. There are drawbacks

in the conventional CORDIC [17], for example excessive

iterations, poor accuracy, and especially the data dependence of

the neighbouring iterations that restricts the speed signiﬁcantly.

Hence, adaptive recoding CORDIC (ARC) [19] is preferred

to improve the accuracy and accelerate the rotation process.

The proposed architecture has been synthesized on a Xilinx

Virtex-5 LX110T to verify the correctness and performance.

Compared to the state-of-the-art DCT and the latest uniﬁed

DCT/IDCT architectures, the proposed architecture demon-

strates signiﬁcantly improved performance.

The rest of this paper is organized as follows: Related work

about DCT and IDCT is discussed in the following section.

The background of DCT, IDCT, conventional CORDIC and

ARC are described in Section III. In Section IV, we discuss

the proposed novel uniﬁed DCT/IDCT architecture and the

implementations of ARC rotators. Section V analyzes the

simulation and comparison results on an FPGA. Conclusions

are drawn in Section VI.

II. RELATED WORK

Much research work to improve DCT and IDCT has been

done based on three general groups of methods.

The ﬁrst group is to minimize the number of multipliers

978-1-4673-9091-0/15/$31.00

 2015 IEEE

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38564718

粉丝: 5
资源: 916

自适应重编码CORDIC实现低功耗高PSNR FPGA DCT/IDCT架构

关于傅里叶论文

基于FPGA实现的CORDIC模块

基于fpga的cordic算法实现

基于改进的cordic算法的fft复乘及其fpga实现

基于CORDIC的三相SPWM的FPGA实现

编写一个简单的基于cordic算法实现开方运算的Verilog测试代码

cordic 的fpga实现求解相位 vhdl

CORDIC的FPGA实现

通过对角度二极化重编码怎么改进cordic算法

基于cordic算法的nco实现

最新资源