数据仓库中的位图索引技术

92 浏览量更新于2024-07-14 收藏 550KB PDF 举报

"Bitmap Indices for Data Warehouses" 是一篇由 Kurt Stockinger 和 Kesheng Wu 所著的研究论文，发表于 Lawrence Berkeley National Laboratory 和 University of California。这篇论文主要探讨了在数据仓库环境中，如何利用位图索引来提升查询处理的效率。在数据仓库应用中，位图索引是一种重要的技术，其核心在于利用位图来表示数据表中的特定属性值。位图索引的工作原理是，为每个可能的属性值分配一个位数组，当某条记录包含该属性值时，对应的位被设置为1，反之则为0。通过这种方式，复杂的查询操作可以转化为简单的位运算，从而极大地提高了查询速度。论文首先回顾了现有的位图索引技术文献，将其归纳为三个类别：位图编码、压缩和分箱。位图编码关注如何有效地用位图表示数据；位图压缩旨在减少存储空间，同时保持查询性能；分箱则是将大范围的值分成小的区间，每个区间对应一个位图，以减少位图的数量和复杂性。论文中介绍了一种高效的位图压缩算法，并对其在大规模实际应用数据集上的空间和时间复杂性进行了分析。传统观点认为，位图索引只适用于低基数（cardinality，即属性值的种类数）的属性。然而，作者们展示了压缩后的位图索引即使在高基数属性上也能保持高效性。通过性能测试，论文表明位图索引在查询速度上显著优于投影索引，后者通常被认为是数据仓库中最高效的索引类型。这表明位图索引在处理大量数据和复杂查询时具有显著优势，特别是在数据仓库和OLAP（在线分析处理）场景下。这篇论文为数据仓库的高效查询处理提供了新的视角，强调了位图索引在处理大数据时的潜力，尤其是其压缩技术在节省存储空间和提升查询性能方面的贡献。对于数据库管理员和数据科学家来说，理解并掌握位图索引的原理和应用，能有效优化数据仓库的性能，提升数据分析的效率。

(Comer, 1979), that are theoretically optimal for one-dimensional range queries, but most of

them cannot be used to efficiently answer arbitrary multi-dimensional range queries.

The bitmap index in its various forms was used a long time before relational database systems or

data warehousing systems were developed. Earlier on, the bitmap index was regarded as a

special form of inverted files (Knuth, 1998). The bit-transposed file (Wong et al., 1985) is very

close to the bitmap index currently in use. The name bitmap index was popularized by O'Neil

and colleagues (O’Neil, 1987; O’Neil & Quass, 1997). Following the example set in the

description of Model 204, the first commercial implementation of bitmap indices (O’Neil, 1987),

many researchers describe bitmap indices as a variation of the B-tree index. To respect its earlier

incarnation as inverted files, we regard a bitmap index as a data structure consisting of keys and

bitmaps. Moreover, we regard the B-tree as a way to layout the keys and bitmaps in files. Since

most commercial implementations of bitmap indices come after the product already contains an

implementation of a B-tree, it is only natural for those products to take advantage of the existing

B-tree software. For new developments and experimental or research codes, there is no need to

couple a bitmap index with a B-tree. For example, in a research program that implements many

of the bitmap indexing methods discussed later in this chapter (FastBit, 2005), the keys and the

bitmaps are organized as simple arrays in a binary file. This arrangement was found to be more

efficient than implementing bitmap indices in B-trees or as layers on top of a DBMS (Stockinger

et al. 2002; Wu et al. 2002).

The basic bitmap index uses each distinct value of the indexed attribute as a key, and generates

one bitmap containing as many bits as the number of records in the data set for each key (O’Neil,

1987). Let the attribute cardinality be the number of distinct values present in a data set. The

size of a basic bitmap index is relatively small for low-cardinality attributes, such as “gender,”

“types of cars sold per month,” or “airplane models produced by Airbus and Boeing.” However,

for high-cardinality attributes such as “temperature values in a supernova explosion,” the index

sizes may be too large to be of any practical use. In the literature, there are three basic strategies

to reduce the sizes of bitmap indices: (1) using more complex bitmap encoding methods to

reduce the number of bitmaps or improve query efficiency, (2) compressing each individual

bitmap, and (3) using binning or other mapping strategies to reduce the number of keys. In the

remaining discussions, we refer to these three strategies as encoding, compression and binning,

for short.

BITMAP INDEX DESIGN

Basic Bitmap Index

Bitmap indices are one of the most efficient

indexing methods available for speeding up multi-

dimensional range queries for read-only or read-

mostly data (O’Neil, 1987; Rotem et al., 2005b;

Wu et al., 2006). The queries are evaluated with

bitwise logical operations that are well supported

by computer hardware. For an attribute with c

distinct values, the basic bitmap index generates c

bitmaps with N bits each, where N is the number

of records (rows) in the data set. Each bit in a

bitmap is set to “1” if the attribute in the record is

Figure 1: Simple bitmap index with 6 bitmaps to

represent 6 distinct attribute values.

剩余16页未读，继续阅读

weixin_38655484

粉丝: 4
资源: 909

数据仓库中的位图索引技术

优化数据仓库的编码位图索引

位图Bitmap实现与高效操作-邓俊辉《数据结构》解析

安卓Android源码解析：Bitmap位图渲染与操作技术

Encoded Bitmap Indexing for Data Warehouses

Bitmap Indexing and Related Techniques - Slides-计算机科学

Bitmap Index Design and Evaluation - 1998 (P355)-计算机科学

A Tunable Compression Framework for Bitmap Indices (Guzun_ICDE_2014)-计算机科学

bmf2---bitmap-font-for-Corona-graphics-2.0:皇冠 lua 导出 + Unicode 支持

PLWAH+ - A Bitmap Index Compressing Scheme - 2014 (jcao_c_plwah)-计算机科学

Sorting improves word-aligned bitmap indexes - 2014 (0901.3751v6)-计算机科学

最新资源