174
1-4244-1027-4/07/$25.00 ©2007 IEEE
SIMD Vectorization of Histogram Functions
Asadollah Shahbahrami
1 , 2
, Ben Juurlink
1
, and Stamatis Vassiliadis
1
1
Computer Engineering Laboratory
2
Department of Electrical Engineering
Delft University of Technology Faculty of Engineering
2628 CD Delft, The Netherlands The University of Guilan
shahbahrami,benj,stamatis@ce.et.tudelft.nl Rasht, Iran
Abstract
Existing SIMD extensions cannot efficiently vectorize the
histogram function due to memory collisions. We propose
two techniques to avoid this problem. In the first, a hi-
erarchical structure of three levels is proposed. In order
to provide n-way parallelism, auxiliary arrays that have n
and n/2 subarrays are used in the first and second level,
respectively. The last level has the primary histogram ar-
ray. Indirect SIMD load and store instructions are designed
in order to access different elements of different subarrays.
The different subarrays in the lower levels are merged and
finally at the end, the calculated results are stored in the pri-
mary histogram array. In the second method, parallel com-
parators are used in order to count the number of subwords
within a media register that are the same. Thereafter, these
numbers are added to the values of the histogram array si-
multaneously. Experimental results obtained by extending
the SimpleScalar toolset show that proposed techniques im-
prove the performance compared to the fastest scalar ver-
sion by a factor of 7.37 and 5.52, respectively.
Keywords: Subword Parallelism, Multimedia Extensions,
Histogram Calculation.
1 Introduction
Histogram features are used in some applications such
as image and video retrieval, video sequence segmentation,
and objects classification in image processing and pattern
recognition [5]. Additionally, in [4] has been indicated that
using the color histogram features is the most suitable com-
pared to the texture and shape features for large databases
such as Web. This is due to its simplicity, invariant to image
rotation, and low storage requirements compared to the size
of the image.
Given an image of size N × M, a histogram is simply the
count of how many pixels of the image map to each element
This research was supported in part by the Netherlands Organization for
Scientific Research (NWO).
35 50 50 247 50 247 50 252
0 20 35 50 105 247 252 255
# of pixels
Intensity
Histogram
Figure 1. SIMD vectorization of histogram
leads to memory collisions when multiple
subwords contain the same value.
as shown in the code below:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
Histogram[image[i][j]] +=1;
The size of the histogram array depends on the number of
bits per pixel (bpp). If bpp = n, the histogram array has 2
n
elements. We call this array the primary histogram array.
SIMD vectorization of the histogram computation, how-
ever, is a challenging problem. The most important rea-
son for this is memory collisions [1] as illustrated in Fig-
ure 1. Memory collisions increase the number of memory
accesses. In image and video processing collisions are com-
mon because there are many occurrences of the same pixel
value in either an image or a frame.
Existing SIMD architectures such as MMX [9] and
SSE [10] cannot efficiently vectorize this important func-
tion. The most important reason is that these SIMD exten-
sions cannot support indexed (indirect) load or store instruc-
tions. Hence, we propose two techniques to vectorize the
histogram calculation using subword processing. The first
method is called hierarchical structure. This structure has
three levels. In the first and second levels, auxiliary arrays
with different sizes and data type are used. The primary his-
togram array is used in the last level. We use n pixel values
as indirect pointers to these auxiliary arrays using SIMD