硬件导向的自适应多分辨率运动估计算法与VLSI架构优化

54 浏览量更新于2024-08-26 收藏 768KB PDF 举报

本文主要探讨了"面向硬件的自适应多分辨率运动估计算法及其VLSI架构"，由郭庆翔、胡志军*、季洁、袁莉和谢晓东等人在清华大学电子工程系发表。针对高清视频编码器，作者提出了一种旨在降低硬件成本的硬件适应性多分辨率运动估计算法（AMMEA）。该方法的核心思想是利用基于纹理的搜索策略，结合时空稳定性与空间同质性的特性，通过SoBlede边缘检测器实现更高效的运动估计。算法设计中，重点采用了四像素SAD（Sum of Absolute Differences）单元作为基本处理元素（PE），这个单元不仅用于计算SAD值，还能进行SoBlede边缘检测器的运算，从而提高了硬件的灵活性和性能。这种设计旨在提高数据利用率和吞吐量，使得算法在硬件层面更为紧凑且高效。通过常规数据流进行模拟实验，结果显示，与现有方法相比，该自适应多分辨率运动估计算法的硬件架构能够在显著降低硬件成本的同时，仅带来0.03分贝的PSNR（Peak Signal-to-Noise Ratio）轻微损失。这表明该方案在保持视频质量的同时，成功地优化了硬件资源，对于现代视频编码系统来说，具有重要的实际应用价值和竞争力。此外，文中可能还涉及到了硬件实现的具体细节，如并行处理技术、硬件级优化的策略以及可能的硬件资源分配，这些都对提升系统性能和降低功耗起到了关键作用。文章的VLSI（Very Large Scale Integration）架构设计部分，可能会深入探讨了电路级别的实现，包括信号处理流程、存储器组织、接口设计以及可能的能耗分析。这篇研究论文为高性能、低功耗的视频编码器设计提供了一个创新的硬件策略，对于推动视频处理领域的硬件优化和效率提升具有重要意义。对于从事硬件设计、视频编码或嵌入式系统开发的工程师来说，这篇论文提供了宝贵的参考和实践指导。

Hardware-Oriented Adaptive Multi-resolution Motion

Estimation Algorithm and Its VLSI Architecture

Guoqing Xiang, Huizhu Jia*, Jie Liu, Yuan Li, Xiaodong Xie

EECS of Peking University

Beijing, 100871

P.R. China

Email: {gqxiang, hzjia, liuzimin, yuanli, xdxie}@jdl.ac.cn

Abstract

—In this paper, we propose a hardware architecture

of an adaptive multi-resolution motion estimation algorithm

(AMMEA) for high definition video encoder to reduce hardware

cost. The texture-based search strategies are based on temporal

stationarity and spatial homogeneity with Sobel edge operator.

The proposed algorithm makes motion estimation more concise.

We also propose Sobel edge operator hardware architecture. The

four-pixel SAD unit which is the basic processing element (PE) in

our proposed architecture is used for SAD calculation and Sobel

edge operator computation. The hardware architecture achieves

very high data utilization and data throughout. Using our

proposed AMMEA with regular data flow, simulation results

show that the proposed architecture can significantly reduce the

hardware cost with a negligible PSNR loss of 0.03dB compared

with the full-search. The design is implemented with SMIC

0.18µm CMOS technology and costs 950K gates count, and it

supports the real-time encoding of 1080P@30fps with two

reference frames under a clock frequency of 150MHz.

Keywords—Multi-resolution motion estimation; adaptive

search strategies; VLSI; data re-use; Sobel

NTRODUCTION

Motion estimation (ME) is the most complex part of most

popular video compression standards such as MPEG-1/2/4 and

H.264/AVC [1]. The goal of integer motion estimation is to

reduce temporal redundancies between the current frame and

the reference frame. These video coding standards also use new

techniques such as variable block size motion estimation

(VBSME) and multiple reference frames. Therefore, real-time

motion estimation implementation for high definition (HD)

video encoder brings great challenges for hardware resources

and power consumption.

There are many fast motion estimation algorithms

proposed for video coding, such as SEA [2] and DSA

[3].Although these software-oriented algorithms achieve time

saving, the irregular data flow makes these algorithms

unsuitable for hardware implementation. A commonly used

hardware-friendly ME algorithm is full-search block matching

algorithm (FSBMA) [4], which examines all points in search

window. Due to the large search window size requirement for

*The corresponding author， Huizhu Jia is with Peking University,

also with Cooperative Medianet Innovation Center and Beida

(Binhai) Information Research.

HD video encoder, on-chip memory consumption and

computational resource cost are huge. Multi-resolution ME

algorithm (MMEA) is a good choice for VLSI implementation

to achieve good balance between performance and complexity

in HD encoder [5][6], which is developed with a coarse-to-fine

search hierarchy. However, as analyzed in [7], traditional

MMEA only used fixed search range and the same down-

sampling rate for all sequences without discrimination is a big

problem. For some sequences with complex texture, wrong

motion vector from the coarse level will mislead the search in

the fine level which leads to performance degradation. And, for

some stationary regions such as background, MMEA also

searches in unnecessarily large search window and thus wastes

much computational resources.

In this paper, a hardware oriented adaptive multi-

resolution motion estimation algorithm (AMMEA) and its

VLSI architecture are proposed. AMMEA makes search

strategies customized for each block based on stationary and

homogeneous features of current macroblock (MB). The four-

pixel SAD unit which is the basic PE is applied to search

strategy determination and multi-resolution motion estimation.

The proposed architecture can make full use of PEs between

SAD calculation and Sobel edge operator computation. It will

achieve a better balance between the hardware resource and

performance.

The remainder of this paper is arranged as follows.

Section II gives a brief introduction of hardware oriented

AMMEA. The overall VLSI architecture is developed and a

search strategy decision block is proposed based on

reconfigurable PE array in Section III. The simulation results

and the comparisons with other previous works are given in

Section IV. Finally, conclusions are drawn in Section V.

II. A

DAPTIVE

ULTI

-R

ESOLUTION

OTION

STIMATION

LGORITHM

In this section, we will introduce the adaptive multi-

resolution motion estimation algorithm [7]. In traditional three-

level MMEA [5][6], it is developed with a coarse-to-fine

hierarchical search. In the coarsest level, potential match

candidates in the reference are obtained from the largest search

window. In the modestly coarse level, several search windows

are centered at the candidates provided from the immediate

upper coarse level. In the finest level, the search range (SR)

will be set to be very small for calculation reduction and is

2194

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38657848

粉丝: 5
资源: 906

硬件导向的自适应多分辨率运动估计算法与VLSI架构优化

扩频通信数字基带信号处理算法及其VLSI实现 PDF

rime输入法-下载 RIME／中州韻輸入法引擎，是一個跨平臺的輸入法算法框架 基於這一框架，Rime 開發者與其他開源社區的參與者在 Windows、macOS、Linux、Android 等平

深度学习项目-街景字符识别.zip

ruoyi-vue-pro-vben 芋道管理后台，基于 vben 最新版本，最新的 vue3 vite6 ant-design-vue 4.0 typescript 语法进行重构开发

MATLAB实现TSO-LSSVM金枪鱼群算法优化最小二乘支持向量机多输入单输出回归预测（多指标，多图）（含完整的程序和代码详解）

(完整数据)全国土地出让、流转与城市房价微观数据合集（三份数据）

操作系统-模拟进程调度（时间片轮转调度算法，高优先级调度算法）C语言实现-实验报告

C#Excel导入学生成绩管理系统源码数据库 SQL2008源码类型 WebForm

【java毕业设计】定州人民医院药品采购管理系统源码（完整前后端+说明文档+LW）.zip

网络安全与渗透测试工具导航.zip

最新资源

rime输入法-下载 RIME／中州韻輸入法引擎，是一個跨平臺的輸入法算法框架基於這一框架，Rime 開發者與其他開源社區的參與者在 Windows、macOS、Linux、Android 等平