(Comer, 1979), that are theoretically optimal for one-dimensional range queries, but most of
them cannot be used to efficiently answer arbitrary multi-dimensional range queries.
The bitmap index in its various forms was used a long time before relational database systems or
data warehousing systems were developed. Earlier on, the bitmap index was regarded as a
special form of inverted files (Knuth, 1998). The bit-transposed file (Wong et al., 1985) is very
close to the bitmap index currently in use. The name bitmap index was popularized by O'Neil
and colleagues (O’Neil, 1987; O’Neil & Quass, 1997). Following the example set in the
description of Model 204, the first commercial implementation of bitmap indices (O’Neil, 1987),
many researchers describe bitmap indices as a variation of the B-tree index. To respect its earlier
incarnation as inverted files, we regard a bitmap index as a data structure consisting of keys and
bitmaps. Moreover, we regard the B-tree as a way to layout the keys and bitmaps in files. Since
most commercial implementations of bitmap indices come after the product already contains an
implementation of a B-tree, it is only natural for those products to take advantage of the existing
B-tree software. For new developments and experimental or research codes, there is no need to
couple a bitmap index with a B-tree. For example, in a research program that implements many
of the bitmap indexing methods discussed later in this chapter (FastBit, 2005), the keys and the
bitmaps are organized as simple arrays in a binary file. This arrangement was found to be more
efficient than implementing bitmap indices in B-trees or as layers on top of a DBMS (Stockinger
et al. 2002; Wu et al. 2002).
The basic bitmap index uses each distinct value of the indexed attribute as a key, and generates
one bitmap containing as many bits as the number of records in the data set for each key (O’Neil,
1987). Let the attribute cardinality be the number of distinct values present in a data set. The
size of a basic bitmap index is relatively small for low-cardinality attributes, such as “gender,”
“types of cars sold per month,” or “airplane models produced by Airbus and Boeing.” However,
for high-cardinality attributes such as “temperature values in a supernova explosion,” the index
sizes may be too large to be of any practical use. In the literature, there are three basic strategies
to reduce the sizes of bitmap indices: (1) using more complex bitmap encoding methods to
reduce the number of bitmaps or improve query efficiency, (2) compressing each individual
bitmap, and (3) using binning or other mapping strategies to reduce the number of keys. In the
remaining discussions, we refer to these three strategies as encoding, compression and binning,
for short.
BITMAP INDEX DESIGN
Basic Bitmap Index
Bitmap indices are one of the most efficient
indexing methods available for speeding up multi-
dimensional range queries for read-only or read-
mostly data (O’Neil, 1987; Rotem et al., 2005b;
Wu et al., 2006). The queries are evaluated with
bitwise logical operations that are well supported
by computer hardware. For an attribute with c
distinct values, the basic bitmap index generates c
bitmaps with N bits each, where N is the number
of records (rows) in the data set. Each bit in a
bitmap is set to “1” if the attribute in the record is