6 Chapter 1: Bit wizardry
1.1.10 Integer versus float multiplication
The floating-point multiplier gives the highest bits of the product. Integer multiplication gives the
result modulo 2
b
where b is the number of bits of the integer type used. As an example we square the
number 111111111 using a 32-bit integer type and floating-point types with 24-bit and 53-bit mantissa
(significand):
a = 111111111 // assignment
a*a == 12345678987654321 // true result
a*a == 1653732529 // result with 32-bit integer multiplication
(a*a)%(2**32) == 1653732529 // ... which is modulo (2**bits_per_int)
a*a == 1.2345679481405440e+16 // result with float multiplication (24 bit mantissa)
a*a == 1.2345678987654320e+16 // result with float multiplication (53 bit mantissa)
1.1.11 Double precision float to signed integer conversion
Conversion of double precision floats that have a 53-bit mantissa to signed integers via [11, p.52-53]
1 #define DOUBLE2INT(i, d) { double t = ((d) + 6755399441055744.0); i = *((int *)(&t)); }
2 double x = 123.0;
3 int i;
4 DOUBLE2INT(i, x);
can be a faster alternative to
1 double x = 123.0;
2 int i = x;
The constant used is 6755399441055744 = 2
52
+ 2
51
. The method is machine dependent as it relies on the
binary representation of the floating-point mantissa. Here it is assumed that, the floating-point number
has a 53-bit mantissa with the most significant bit (that is always one with normalized numbers) omitted,
and that the address of the number points to the mantissa.
1.1.12 Optimization considerations
Never assume that some code is the ‘fastest possible’. There is always another trick that can still improve
performance. Many factors can have an influence on performance, like the number of CPU registers or
cost of branches. Code that performs well on one machine might perform badly on another. The old
trick to swap variables without using a temporary is pretty much out of fashion today:
// a=0, b=0 a=0, b=1 a=1, b=0 a=1, b=1
a ^= b; // 0 0 1 1 1 0 0 1
b ^= a; // 0 0 1 0 1 1 0 1
a ^= b; // 0 0 1 0 0 1 1 1
// equivalent to: tmp = a; a = b; b = tmp;
However, under some conditions (like extreme register pressure) it may be the way to go. Note that if
both operands are identical (memory locations) then the result is zero.
The only way to find out which version of a function is faster is to actually do benchmarking (timing). The
performance does depend on the sequence of instructions surrounding the machine code, assuming that
all of these low-level functions get inlined. Studying the generated CPU instructions helps to understand
what happens, but can never replace benchmarking. This means that benchmarks for just the isolated
routine can at best give a rough indication. Test your application using different versions of the routine
in question.
Never ever delete the unoptimized version of some code fragment when introducing a streamlined one.
Keep the original in the source. If something nasty happens (think of low level software failures when
porting to a different platform), you will be very grateful for the chance to temporarily resort to the slow
but correct version.
Study the optimization recommendations for your CPU (like [11] and [12] for the AMD64, see also [144]).
You can also learn a lot from the documentation for other architectures.