1. Not to be confused with interprocess communication, which shares the same acronym—we’ll look at the
topic in Chapter 9.
by its instructions per cycle (IPC),
1
while the latter value is measured by its clock speed.
These two measures are always competing with each other when new computing units
are being made. For example, the Intel Core series has a very high IPC but a lower clock
speed, while the Pentium 4 chip has the reverse. GPUs, on the other hand, have a very
high IPC and clock speed, but they suffer from other problems, which we will outline
later.
Furthermore, while increasing clock speed almost immediately speeds up all programs
running on that computational unit (because they are able to do more calculations per
second), having a higher IPC can also drastically affect computing by changing the level
of vectorization that is possible. Vectorization is when a CPU is provided with multiple
pieces of data at a time and is able to operate on all of them at once. This sort of CPU
instruction is known as SIMD (Single Instruction, Multiple Data).
In general, computing units have been advancing quite slowly over the past decade (see
Figure 1-1). Clock speeds and IPC have both been stagnant because of the physical
limitations of making transistors smaller and smaller. As a result, chip manufacturers
have been relying on other methods to gain more speed, including hyperthreading,
more clever out-of-order execution, and multicore architectures.
Hyperthreading presents a virtual second CPU to the host operating system (OS), and
clever hardware logic tries to interleave two threads of instructions into the execution
units on a single CPU. When successful, gains of up to 30% over a single thread can be
achieved. Typically this works well when the units of work across both threads use
different types of execution unit—for example, one performs floating-point operations
and the other performs integer operations.
Out-of-order execution enables a compiler to spot that some parts of a linear program
sequence do not depend on the results of a previous piece of work, and therefore that
both pieces of work could potentially occur in any order or at the same time. As long
as sequential results are presented at the right time, the program continues to execute
correctly, even though pieces of work are computed out of their programmed order.
This enables some instructions to execute when others might be blocked (e.g., waiting
for a memory access), allowing greater overall utilization of the available resources.
Finally, and most important for the higher-level programmer, is the prevalence of multi-
core architectures. These architectures include multiple CPUs within the same unit,
which increases the total capability without running into barriers in making each in‐
dividual unit faster. This is why it is currently hard to find any machine with less than
two cores—in this case, the computer has two physical computing units that are con‐
nected to each other. While this increases the total number of operations that can be
The Fundamental Computer System | 3