16
3.12 Execution unit throughput
There is an important distinction between the latency and the throughput of an execution
unit. For example, it takes five clock cycles to do a floating point addition on a Pentium 4.
But it is possible to start a new floating point addition every clock cycle. This means that if
each addition depends on the result of the preceding addition then you will have only one
addition every five clock cycles. But if all the additions are independent then you can have
one addition every clock cycle.
The highest performance that can possibly be obtained in a computationally intensive
program is achieved when none of the time-consumers mentioned in the above sections are
dominating and there are no long dependence-chains. In this case, the performance is
limited by the throughput of the execution units rather than by the latency or by memory
access.
The execution core of modern microprocessors is split between several execution units.
Typically, there are two or more integer units, one floating point addition unit, and one
floating point multiplication unit. This means that it is possible to do an integer addition, a
floating point addition, and a floating point multiplication at the same time.
A code that does floating point calculations should therefore preferably have a balanced mix
of additions and multiplications. Subtractions use the same unit as additions. Divisions take
longer time and use the multiplication unit. It is possible to do integer operations in-between
the floating point operations without reducing the performance because the integer
operations use a different execution unit. For example, a loop that does floating point
calculations will typically use integer operations for incrementing a loop counter, comparing
the loop counter with its limit, etc. In most cases, you can assume that these integer
operations do not add to the total computation time.
4 Performance and usability
A better performing software product is one that saves time for the user. Time is a precious
resource for many computer users and much time is wasted on software that is slow,
difficult to use, incompatible or error prone. All these problems are usability issues, and I
believe that software performance should be seen in the broader perspective of usability. A
list of literature on usability is given on page 132.
This is not a manual on usability, but I think that it is necessary here to draw the attention of
software programmers to some of the most common obstacles to efficient use of software.
The following list points out some typical sources of frustration and waste of time for
software users as well as important usability problems that software developers should be
aware of.
• Big runtime frameworks. The .NET framework and the Java virtual machine are
frameworks that typically take much more resources than the programs they are
running. Such frameworks are frequent sources of resource problems and
compatibility problems and they waste a lot of time both during installation of the
framework itself, during installation of the program that runs under the framework,
during start of the program, and while the program is running. The main reason why
such runtime frameworks are used at all is for the sake of cross-platform portability.
Unfortunately, the cross-platform compatibility is not always as good as expected. I
believe that the portability could be achieved more efficiently by better standardi-
zation of programming languages, operating systems, and API's.
• Memory swapping. Software developers typically have more powerful computers
with more RAM than end users have. The developers may therefore fail to see the
excessive memory swapping and other resource problems that cause the resource-