并行提取与并行沉积指令加速位压缩与扩展 - 2006年(hilewitz06)

181 浏览量更新于2024-08-25 收藏 340KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions - 2006 (hilewitz06)-计算机科学" 当前的微处理器指令集架构主要关注字（word）操作，并提供了一些子字（subword）级别的支持。然而，许多重要的应用程序可以从位（bit）定向的指令中获得显著的性能提升。论文提出了并行提取（pex）和并行存入（pdep）指令，旨在加速位选择的压缩和扩展。这两种指令通过快速的逆蝴蝶（fast inverse butterfly）和蝴蝶网络（butterfly network）电路来实现。并行提取（pex）指令允许快速地从一个数据字中提取出多个指定位置的位，而并行存入（pdep）指令则能够将这些位插入到另一个数据字的特定位置，从而实现位压缩和位扩展。这两个新指令的引入，对处理位操作密集型的应用如数据编码、解码、位域操作以及在计算机网络、加密算法、数据压缩等领域都有潜在的巨大性能优化作用。论文中，作者评估了不同功能单元实现高级位操作指令子集的延迟和面积成本。实验结果表明，与基础的精简指令集计算机（RISC）架构相比，这些指令平均可以提供3.41倍的性能提升；对于已经支持提取和存入指令的指令集架构（ISA），平均性能提升也达到了2.48倍。这充分证明了pex和pdep指令的有效性。 1. 引言传统的微处理器操作通常基于字或更现代的子字操作。然而，许多关键应用受益于位操作，例如，使用基本指令（如与、移位、或等）实现任意n位的排列需要O(n)次操作。通过引入位定向的指令，可以有效地减少这种复杂性，提高计算效率。 2. 指令设计与实现并行提取和并行存入指令利用了高效的硬件电路设计，如逆蝴蝶和蝴蝶网络，它们可以在并行处理中有效地进行位的提取和存入，大大减少了执行时间。 3. 性能分析性能评估显示，这些新指令在各种应用场景下都能显著缩短执行时间，这包括但不限于数据编码解码、位操作密集的算法和系统级任务。 4. 应用示例论文中列举了若干使用pex和pdep指令实现的实例，证明了它们在实际应用中的速度提升，如在数据流处理、压缩算法以及位操作密集的计算任务中。通过引入并行提取和并行存入指令，不仅优化了微处理器的性能，也为未来指令集架构的设计提供了新的思路，使得硬件可以更好地适应和处理位操作密集型的计算任务，进一步推动了计算机科学领域的技术进步。

资源详情

资源推荐

Abstract

Current microprocessor instruction set

architectures are word oriented, with some subword

support. Many important applications, however, can

realize substantial performance benefits from bit-

oriented instructions. We propose the parallel extract

(pex) and parallel deposit (pdep) instructions to

accelerate compressing and expanding selections of

bits. We show that these instructions can be

implemented by the fast inverse butterfly and butterfly

network circuits. We evaluate latency and area costs

of alternative functional units for implementing

subsets of advanced bit manipulation instructions. We

show applications exhibiting significant speedup,

3.41× on average over a basic RISC architecture, and

2.48× on average over an instruction set architecture

(ISA) that supports extract and deposit instructions.

1. Introduction

Operations on microprocessors are typically word,

and more recently subword [1], oriented. However,

many important applications benefit from bit-oriented

operations. For example, arbitrary n-bit permutations

take O(n) operations using basic instructions such as

and, shift and or to move individual bits [2]. A

few fixed permutations, such as in ciphers like DES,

have been optimized by table lookup [2], still taking

tens to hundreds of cycles, due to cache misses.

Recent research showed that specialized bit-oriented

instructions can permute bits in O(lg n) [2-4] or even

O(1) operations [5,6]. For example for n=64, any one

of 64! bit permutations can be achieved in 1 or 2

cycles by butterfly (bfly) and inverse butterfly

(ibfly) permutation instructions [5,6]. Such speedup

can enable previously difficult bit manipulation

computations to be done much more efficiently.

This paper discusses another important class of

bit-oriented operations involving selecting and

compressing bits, and distributing bits according to

different bit patterns. We call these parallel extract

(pex) and parallel deposit (pdep) operations,

respectively. pdep and pex can also be viewed as

bit-level scatter and gather instructions. These

operations are important in application domains such

as bioinformatics, image processing, steganography,

cryptanalysis and coding.

Fast Bit Compression and Expansion with Parallel Extract and

Parallel Deposit Instructions

Yedidya Hilewitz and Ruby B. Lee

Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA

{hilewitz, rblee}@princeton.edu

We present the architectural definition of these

two novel bit instructions. We show how pdep can be

implemented using the single-cycle butterfly network

datapath. We evaluate alternative new functional units

that implement useful subsets of these advanced bit

manipulation instructions, and recommend one that is

smaller than an ALU with shorter latency. Our

performance results indicate that a processor enhanced

with pex and pdep achieves a 5.2× maximum

speedup, 3.41× on average, over a basic RISC

architecture.

Section 2 describes the new pex and pdep

instructions. Section 3 presents the ISA definitions.

Section 4 discusses the implementation and different

options for a new functional unit implementing

advanced bit-oriented instructions. Section 5 describes

applications of these instructions and section 6 their

performance. Section 7 concludes the paper.

2. Parallel extract and parallel deposit

It is often necessary to select non-contiguous bits

from data. For example, in pattern matching, many

pairs of features may be compared. Then, a subset of

these comparison result bits are selected, compressed

and used as an index to look up a table. This selection

and compression of bits is what a pex instruction does

(Figure 1(b)). A pex instruction can also be viewed as

a parallel version of the extract (extr) instruction [7,

8]. The extr instruction extracts a single field of bits

from any position in the source register and right

This work was supported in part by the DoD and Intel.

Yedidya Hilewitz is a Hertz Foundation Fellow.

Yedidya Hilewitz and Ruby B. Lee, Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions, Proceedings of

the IEEE 17

International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13,

2006

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38539705

粉丝: 6
资源: 952

并行提取与并行沉积指令加速位压缩与扩展 - 2006年(hilewitz06)

CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning

Parallel Lossless Data Compression on the GPU-计算机科学

tar -xzvf abc.tar.gz -C dir01/

Error: Cannot find module 'compression-webpack-plugin'

compression-webpack-plugin

compression-webpack-plugin使用方法

While resolving: compression-webpack-plugin@3.1.0 npm ERR! Found: webpack@3.12.0

nuxt安装compression-webpack-plugin

如何安装tensorflow-compression？

用percona-xtrabackup工具写一个mysql增量备份的脚本

uniapp中如何compression-webpack-plugin插件进行打包

learned image compression with discretized gaussian mixture likelihoods and

Cannot find module 'compression-webpack-plugin'

Cannot find module 'vite-plugin-compression

signal and system

compression-webpack-plugin minRatio 0.1

最新资源