浮点数数据的高效实时压缩方案

需积分: 11 173 浏览量更新于2024-09-14 收藏 1.95MB PDF 举报

"快速高效浮点数据压缩技术" 在大规模科学模拟代码的运行过程中，通常依赖于集群中的CPU处理大量数据，这些数据以时间步的形式写入或读取单一文件系统。随着数据集规模的持续增长，这可能导致输入输出（I/O）瓶颈，特别是在数据生成速率超过可用带宽时，计算性能会显著下降，CPU资源闲置。因此，数据压缩成为解决这一问题的有效手段，通过占用部分CPU周期来减少传输的数据量，从而缓解I/O压力。然而，大多数现有的压缩算法是为离线使用设计的，目标是最大化压缩比，而不是在线实时性能。它们往往采用将浮点数值量化到均匀整数网格的方式，这对于那些必须保留精确值的应用场景并不适用，比如数值模拟、机器学习等领域。本文提出了一种简单且健壮的无损在线浮点数据压缩方案，旨在无缝融入大型科学模拟集群的I/O操作流程。该方法特别之处在于其数据依赖的预测插件机制，使其能够适应各种不同类型和结构的数据，包括但不限于气象模型、粒子模拟等对精度要求高的领域。这种在线压缩策略允许实时进行数据处理，提高了整个系统的效率，同时避免了数值失真，确保了结果的准确性。为了实现这个目标，我们的方案可能包括以下关键技术： 1. 流式压缩：算法设计为连续处理数据流，而不是一次性处理整个文件，这样可以在不影响实时性能的同时进行压缩。 2. 动态预测：利用先前数据的统计特性，通过数据依赖的预测来估计当前数据的可能性，从而减少冗余信息的存储。 3. 自适应量化：针对不同精度需求，灵活地调整量化粒度，既能节省空间，又能保持必要的精度。 4. 并行化：为了进一步提高效率，可能采用多核或分布式计算架构，使压缩过程与计算任务协同进行。 5. 解压性能优化：为了确保在解压缩阶段的高效性，可能采用了优化的数据结构和快速的算法。 6. 错误校验与恢复：为了保证数据完整性，可能会包含冗余信息或使用纠错编码，即使在压缩过程中发生错误也能恢复数据。总结来说，本文提供的是一种创新的浮点数据压缩策略，旨在通过兼顾性能和精确度，有效应对大规模科学计算中日益严重的I/O挑战，提升集群的整体效能。

Fast and Eﬃcient Compression of Floating-Point Data

Peter Lindstrom

LLNL

Martin Isenburg

UC Berkeley

Abstract

Large scale scientiﬁc simulation codes typically run on a clus-

ter of CPUs that write/read time steps to/from a single ﬁle

system. As data sets are constantly growing in size, this in-

creasingly leads to I/O bottlenecks. When the rate at which

data is produced exceeds the available I/O bandwidth, the

simulation stalls and the CPUs are idle. Data compression

can alleviate this problem by using some CPU cycles to re-

duce the amount of data that needs to be transfered. Most

compression schemes, however, are designed to operate of-

ﬂine and try to maximize compression, not online through-

put. Furthermore, they often require quantizing ﬂoating-

point values onto a uniform integer grid, which disqualiﬁes

their use in applications where exact values must be retained.

We propose a simple and robust scheme for lossless, on-

line compression of ﬂoating-point data that transparently

integrates into the I/O of a large scale simulation cluster.

A plug-in scheme for data-dependent prediction makes our

scheme applicable to a wide variety of data sets us ed in visu-

alization, such as unstructured meshes, point sets, images,

and voxel grids. We achieve state-of-the-art compression

rates and compression speeds, the latter in part due to an

improved entropy coder. We demonstrate that this signif-

icantly accelerates I/O throughput in real simulation runs.

Unlike previous schemes, our method also adapts well to

variable-precision ﬂoating-point and integer data.

CR Categories: E.4 [Coding and Information Theory]:

Data compaction and compression

Keywords: high throughput, lossless compression, ﬁle com-

paction for I/O eﬃciency, fast entropy co ding, range coder,

predictive coding, large scale simulation and visualization.

1 Introduction

Data sets from scientiﬁc simulation and scanning devices are

growing in size at an exponential rate, placing great demands

on memory and storage availability. Storing such data un-

compressed results in large ﬁles that are slow to read from

and write to disk, often causing I/O bottlenecks in simula-

tion, data processing, and visualization that stall the appli-

cation. With disk performance lagging increasingly behind

the frequent doubling in CPU speed, this problem is ex-

pected to become even more urgent over the coming years.

A large scale simulation may run on a cluster of hundreds

to thousands of supercomputer nodes that write the results

of each time step to a shared ﬁle system for subsequent anal-

ysis and visualization [24]. Typically this involves storing

large amounts of single- or double-precision ﬂoating-point

numbers that represent one or more variables of simulation

state per vertex/cell. When the CPU speed with which the

simulation can be updated exceeds the available I/O band-

width, the simulation stalls and the CPUs are idle.

Data compression strategies have the potential to combat

this problem. By making use of excess CPU cycles, data can

be compressed and uncompressed to reduce the number of

bytes that need to be transferred between memory and disk

or across ﬁle systems, eﬀectively boosting I/O performance

at little or no cost while reducing storage requirements.

The visualization community has developed compression

schemes for unstructured data such as point sets [6, 3], tri-

angular [27, 18], polygonal [19, 13], tetrahedral [11, 2], and

hexahedral [14] meshes and for structured data such as im-

ages and voxel grids [8, 12]. However, most of these schemes

are designed to maximize compression rate rather than data

throughput. They are commonly applied as an oﬄine pro-

cess after the raw, uncompressed data has already been

stored on disk. In order to maximize eﬀective throughput,

one must consider how to best balance compression speed

and available I/O bandwidth, and at the same time support

suﬃciently eﬃcient decompression. While higher compres-

sion rates improve eﬀective bandwidth, this gain often comes

at the expense of a slow and complex coding scheme.

Furthermore, prior methods often expect that vertex po-

sitions and ﬁeld values can be quantized onto a uniform in-

teger grid for e ﬃcient (but lossy) predictive compression.

This mo diﬁes the original data as the non-linear precision of

ﬂoating-point numbers cannot be preserved. In many science

and engineering applications, however, exact values must be

retained, e.g. for checkpoint dumps of simulation state and

for accurate analysis and computation of derived quantities

such as magnitudes, curls, ﬂuxes, critical points, etc. The

use of uniform quantization is also prohibited for data sets

that exploit the non-linearity of the ﬂoating-point represen-

tation to allocate more precision to important features by

speciﬁcally aligning them with the origin. Quantization can

also change geometric relationships in the data (e.g. triangle

orientation, Delaunay properties). Finally, scientists are of-

ten particular about their data and will simply refrain from

using a compression scheme that modiﬁes their data.

To address these needs, we propos e a novel and surpris-

ingly simple scheme for fast, lossless, online compression of

ﬂoating-point data using predictive coding. It provides a

well balanced trade-oﬀ between computation speed and data

reduction and can be integrated almost transparently with

standard I/O. Our scheme makes no assumption on the na-

ture of data to be compressed, but relies on a plug-in scheme

for computing data-dependent predictions. It is hence ap-

plicable to a wide variety of data sets used in visualization,

such as unstructured meshes, point sets, images, and voxel

grids. In contrast to many previous schemes, our method

naturally extends to compression of adaptively quantized

ﬂoating-point values and to coding of integer data.

We present results of lossless and lossy ﬂoating-point com-

pression for scalar values of structured 2D and 3D grids,

ﬁelds deﬁned over p oint sets, and for geometry coding of

unstructured meshes. We compare our results with re cent

ﬂoating-point compression schemes to show that we achieve

both state-of-the-art compression rates and speeds. The

high compression speed can be attributed in part to the use

of an optimized, high-speed entropy coder, described here.

As a result, our compressor is able to produce substantial

increases in eﬀective I/O rate for data-heavy applications

such as large scale scientiﬁc simulations.

下载后可阅读完整内容，剩余6页未读，立即下载

cdy816

粉丝: 8
资源: 10

浮点数数据的高效实时压缩方案

compression-webpack-plugin, 准备压缩的资产版本以提供内容编码.zip

DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES

While resolving: compression-webpack-plugin@3.1.0 npm ERR! Found: webpack@3.12.0

Error: Cannot find module 'compression-webpack-plugin'

如何手动安装`compression-webpack-plugin`？

uniapp中如何compression-webpack-plugin插件进行打包

Cannot find module 'compression-webpack-plugin'

compression-webpack-plugin使用方法

compression-webpack-plugin

最新资源