Roaring:优化的压缩位图技术提升性能与大小

49 浏览量更新于2024-07-14 收藏 315KB PDF 举报

"Consistently Faster and Smaller Compressed Bitmaps with Roaring" 是一篇发表于2016年4月19日的计算机科学论文（1603.06549），由作者D. Lemire、G. Ssi-Yan-Kai和O. Kaser共同撰写。该研究集中在压缩位图索引技术上，这些在数据库和搜索引擎中广泛应用。传统的压缩方法，如BBC和WAH，主要依赖于run-length encoding (RLE)算法来节省存储空间。然而，当数据未经排序时，一种称为Roaring的混合压缩技术能够在性能上展现出显著优势。Roaring利用了未压缩位图和两层树结构中的紧凑数组，这使得它在处理大量无序数据时表现出色，并且已被多个生产平台采用，如Apache Lucene、Apache Kylin和Druid等。尽管Roaring在大多数情况下提供了更快和更小的压缩，但在数据已排序且包含较长可压缩序列的情况下，run-length encoded bitmaps可能会更小。为了应对这种情况，论文提出了一个新的Roaring混合方法，它结合了未压缩位图和优化的编码策略，旨在适应不同数据特性，兼顾性能和大小优化。这项工作的重要贡献在于改进了压缩算法，使其在各种数据条件下都能提供更一致的性能提升，并且在某些场景下实现存储空间的节省。研究者们通过实证分析展示了新方法的优越性，这对于提高数据库和搜索引擎的整体效率具有重要意义。通过深入理解Roaring的工作原理及其优化，开发者可以更好地选择和定制适合其应用的具体位图压缩技术，从而提高系统的响应速度和存储效率。

ROARING: CONSISTENTLY FASTER AND SMALLER COMPRESSED BITMAPS 5

for most of the life of an application. An array minimizes storage. In a system such as Druid, the

bitmaps are created, stored on disk and then memory-mapped as needed.

The structure of each container is straightforward (by design):

• A bitmap container is an object made of 1024 64-bit words (using 8 kB) representing an

uncompressed bitmap, able to store all sets of 16-bit integers. The container can be serialized

as an array of 64-bit words. We also use a counter to record how many bits are set to 1, and

this counter is kept up-to-date.

Counting the number of 1-bits in a word can be relatively expensive if done naïvely, but

modern processors have bit-count instructions—such as popcnt for x64 processors and cnt

for the 64-bit ARM architecture—that can do this count, sometimes using as little as a single

clock cycle. According to our tests, using dedicated processor instructions can be several times

faster than using either tabulation or other conventional alternatives [16]. Henceforth, we refer

to such a function as bitCount: it is provided in Java as the Long.bitCount intrinsic. We

assume that the platform has a fast bitCount function.

• An array container is an object containing a counter keeping track of the number of integers

followed by a packed array of sorted 16-bit unsigned integers. It can be serialized as a 16-bit

counter followed by the corresponding number of 16-bit values.

We implement array containers as dynamic arrays that grow their capacity using a standard

approach. That is, we keep a count of the used entries in an underlying array that has typically

some excess capacity. When the array needs to grow beyond its capacity, we allocate a larger

array and copy the data to this new array. Our allocation heuristic is as follow: when the

capacity is small (less than 64 entries), we double the capacity; when the capacity is moderate

(between 64 and 1067 entries), we multiply the capacity by 3/2; when the capacity is large

(1067 entries and more), we multiply the capacity by 5/4. Furthermore, we never allocate

more than the maximum needed (4096) and if we are within one sixteenth of the maximum

(> 3840), then we allocate the maximum right away (4096) to avoid any future reallocation.

A simpler heuristic where we double the capacity whenever it is insufﬁcient would be faster,

but it might use more memory than needed. When the array container is no longer expected

to grow, the programmer can use a trim function to copy the data to a new array with no

excess capacity.

• Our new addition, the run container, is made of a packed array of pairs of 16-bit integers. The

ﬁrst value of each pair represents a starting value, whereas the second value is the length of a

run. For example, we would store the values 11, 12, 13, 14, 15 as the pair 11, 4 where 4 means

that beyond 11 itself, there are 4 contiguous values that follow. In addition to this packed array,

we need to maintain the number of runs stored in the packed array. Like the array container,

the run container is stored in a dynamic array. During serialization, we write out the number

of runs, followed by the corresponding packed array.

Unlike array or bitmap containers, a run container does not keep track of its cardinality;

its cardinality can be computed on the ﬂy by summing the lengths of the runs. In most

applications, we expect the number of runs to be small: the computation of the cardinality,

when needed, should be fast.

No container ever uses much more than 8 kB of memory. Several such small containers ﬁt the L1

CPU cache of most processors: the last Intel desktop processor to have less than 64 kB of total (data

and code) L1 cache was the P6 created in 1995, whereas most mobile processors have 32 kB (e.g.,

NVidia, Qualcomm) or 64 kB (e.g., Apple) of total L1 cache.

When starting from an empty Roaring bitmap, if a value is added, an array container is created.

When inserting a new value in an array container, if the cardinality exceeds 4096, then the container

is transformed into a bitmap container. In reverse, if a value is removed from a bitmap container

so that its size falls under 4096 integers, then it is transformed into an array container. Whenever

a container becomes empty, it is removed from the top-level key-value structure along with the

corresponding key.

剩余20页未读，继续阅读

weixin_38626080

粉丝: 8
资源: 973

Roaring:优化的压缩位图技术提升性能与大小

Learning Docker_Faster App Development and Deployment, 2nd-Packt(2017).pdf

Micro CMOS Design, 2012.pdf

java-javafx-sdk-13.0.2.zip

年度工作表现评估表-中英-Excel图表模板.xls

中国海洋大学大英二-4-6单元课本翻译.docx

PHP.and.MySQL.Web.Development.5th.Edition

NGUI Next-Gen UI 3.0.7 f1.unitypackage

ReportBuilderEnterprise.v.18.01.D7.D2007.XE-XE10.2.Stp

Digital Electronics - A Practical Approach with VHDL, 9ed, 2012.pdf

MATLAB Uninstallation and Cloud Computing: Tips for Uninstalling MATLAB in Cloud Computing ...

最新资源