ROARING: CONSISTENTLY FASTER AND SMALLER COMPRESSED BITMAPS 5
for most of the life of an application. An array minimizes storage. In a system such as Druid, the
bitmaps are created, stored on disk and then memory-mapped as needed.
The structure of each container is straightforward (by design):
• A bitmap container is an object made of 1024 64-bit words (using 8 kB) representing an
uncompressed bitmap, able to store all sets of 16-bit integers. The container can be serialized
as an array of 64-bit words. We also use a counter to record how many bits are set to 1, and
this counter is kept up-to-date.
Counting the number of 1-bits in a word can be relatively expensive if done naïvely, but
modern processors have bit-count instructions—such as popcnt for x64 processors and cnt
for the 64-bit ARM architecture—that can do this count, sometimes using as little as a single
clock cycle. According to our tests, using dedicated processor instructions can be several times
faster than using either tabulation or other conventional alternatives [16]. Henceforth, we refer
to such a function as bitCount: it is provided in Java as the Long.bitCount intrinsic. We
assume that the platform has a fast bitCount function.
• An array container is an object containing a counter keeping track of the number of integers
followed by a packed array of sorted 16-bit unsigned integers. It can be serialized as a 16-bit
counter followed by the corresponding number of 16-bit values.
We implement array containers as dynamic arrays that grow their capacity using a standard
approach. That is, we keep a count of the used entries in an underlying array that has typically
some excess capacity. When the array needs to grow beyond its capacity, we allocate a larger
array and copy the data to this new array. Our allocation heuristic is as follow: when the
capacity is small (less than 64 entries), we double the capacity; when the capacity is moderate
(between 64 and 1067 entries), we multiply the capacity by 3/2; when the capacity is large
(1067 entries and more), we multiply the capacity by 5/4. Furthermore, we never allocate
more than the maximum needed (4096) and if we are within one sixteenth of the maximum
(> 3840), then we allocate the maximum right away (4096) to avoid any future reallocation.
A simpler heuristic where we double the capacity whenever it is insufficient would be faster,
but it might use more memory than needed. When the array container is no longer expected
to grow, the programmer can use a trim function to copy the data to a new array with no
excess capacity.
• Our new addition, the run container, is made of a packed array of pairs of 16-bit integers. The
first value of each pair represents a starting value, whereas the second value is the length of a
run. For example, we would store the values 11, 12, 13, 14, 15 as the pair 11, 4 where 4 means
that beyond 11 itself, there are 4 contiguous values that follow. In addition to this packed array,
we need to maintain the number of runs stored in the packed array. Like the array container,
the run container is stored in a dynamic array. During serialization, we write out the number
of runs, followed by the corresponding packed array.
Unlike array or bitmap containers, a run container does not keep track of its cardinality;
its cardinality can be computed on the fly by summing the lengths of the runs. In most
applications, we expect the number of runs to be small: the computation of the cardinality,
when needed, should be fast.
No container ever uses much more than 8 kB of memory. Several such small containers fit the L1
CPU cache of most processors: the last Intel desktop processor to have less than 64 kB of total (data
and code) L1 cache was the P6 created in 1995, whereas most mobile processors have 32 kB (e.g.,
NVidia, Qualcomm) or 64 kB (e.g., Apple) of total L1 cache.
When starting from an empty Roaring bitmap, if a value is added, an array container is created.
When inserting a new value in an array container, if the cardinality exceeds 4096, then the container
is transformed into a bitmap container. In reverse, if a value is removed from a bitmap container
so that its size falls under 4096 integers, then it is transformed into an array container. Whenever
a container becomes empty, it is removed from the top-level key-value structure along with the
corresponding key.