Integer overflow is another security problem. The official C standard says that the behavior
of signed integers in case of overflow is "undefined". This allows the compiler to ignore
overflow or assume that it doesn't occur. In the case of the Gnu compiler, the assumption
that signed integer overflow doesn't occur has the unfortunate consequence that it allows
the compiler to optimize away an overflow check. There are a number of possible remedies
against this problem: (1) check for overflow before it occurs, (2) use unsigned integers -
they are guaranteed to wrap around, (3) trap integer overflow with the option -ftrapv, but
this is extremely inefficient, (4) get a compiler warning for such optimizations with option
-Wstrict-overflow=2, or (5) make the overflow behavior well-defined with option
-fwrapv or -fno-strict-overflow.
You may deviate from the above security advices in critical parts of the code where speed is
important. This can be permissible if the unsafe code is limited to well-tested functions,
classes, templates or modules with a well-defined interface to the rest of the program.
3 Finding the biggest time consumers
3.1 How much is a clock cycle?
In this manual, I am using CPU clock cycles rather than seconds or microseconds as a time
measure. This is because computers have very different speeds. If I write that something
takes 10 μs today, then it may take only 5 μs on the next generation of computers and my
manual will soon be obsolete. But if I write that something takes 10 clock cycles then it will
still take 10 clock cycles even if the CPU clock frequency is doubled.
The length of a clock cycle is the reciprocal of the clock frequency. For example, if the clock
frequency is 2 GHz then the length of a clock cycle is
A clock cycle on one computer is not always comparable to a clock cycle on another
computer. The Pentium 4 (NetBurst) CPU is designed for a higher clock frequency than
other CPUs, but it uses more clock cycles than other CPUs for executing the same piece of
code in general.
Assume that a loop in a program repeats 1000 times and that there are 100 floating point
operations (addition, multiplication, etc.) inside the loop. If each floating point operation
takes 5 clock cycles, then we can roughly estimate that the loop will take 1000 * 100 * 5 *
0.5 ns = 250 μs on a 2 GHz CPU. Should we try to optimize this loop? Certainly not! 250 μs
is less than 1/50 of the time it takes to refresh the screen. There is no way the user can see
the delay. But if the loop is inside another loop that also repeats 1000 times then we have
an estimated calculation time of 250 ms. This delay is just long enough to be noticeable but
not long enough to be annoying. We may decide to do some measurements to see if our
estimate is correct or if the calculation time is actually more than 250 ms. If the response
time is so long that the user actually has to wait for a result then we will consider if there is
something that can be improved.
3.2 Use a profiler to find hot spots
Before you start to optimize anything, you have to identify the critical parts of the program.
In some programs, more than 99% of the time is spent in the innermost loop doing
mathematical calculations. In other programs, 99% of the time is spent on reading and
writing data files while less than 1% goes to actually doing something on these data. It is
very important to optimize the parts of the code that matters rather than the parts of the
code that use only a small fraction of the total time. Optimizing less critical parts of the code