1
CHAPTER 1
Introduction
A multithreaded architecture is one in which a single processor has the ability to follow multiple
streams of execution without the aid of software context switches. If a conventional processor (Fig-
ure 1.1(a)) wants to stop executing instructions from one thread and begin executing instructions
from another thread, it requires software to dump the state of the r unning thread into memory,
select another thread, and then load the state of that thread into the processor. That would typically
require many thousands of cycles, particularly if the operating system is invoked. A multithreaded
architecture (Figure 1.1(b)), on the other hand, can access the state of multiple threads in, or near,
the processor core. This allows the multithreaded architecture to quickly switch between threads,
and potentially utilize the processor resources more efficiently and effectively.
In order to achieve this, a multithreaded architecture must be able to store the state of multiple
threads in hardware—we refer to this storage as hardware contexts, where the number of hardware
contexts supported defines the level of multithreading (the number of threads that can share the
processor without software intervention).The state of a thread is primarily composed of the program
counter (PC), the contents of general purpose registers, and special purpose and program status
registers. It does not include memory (because that remains in place), or dynamic state that can be
rebuilt or retained between thread invocations (branch predictor, cache, or TLB contents).
Hardwaremultithreading is beneficial when there is a mismatch between the hardware support
for instruction level parallelism (ILP) and the le vel of ILP in an executing thread. More generally, if
we want to also inc lude scalar machines, it addresses the gap between the peak hardware bandwidth
and the achieved software throughput of one thread. Multithreading addresses this gap because
it allows multiple threads to share the processor, making execution resources that a single thread
would not use available to other threads. In this way, insufficient instruction level parallelism is
supplemented by exploiting thread level parallelism (TLP). Just as a moving company strives to
never send a moving truck cross country with a partial load, but instead groups multiple households
onto the same truck, a processor that is not being fully utilized by the software represents a wasted
resource and an opportunity cost.
Hardware multithreading virtualizes the processor, because it makes a processor that might
otherwise look little different than a traditional single core appear to software as a multiprocessor.
When we add a second hardware context to a processor, a thread running on this context appears
to be running on a virtual core that has the hardware capabilities of the original core minus those
resources being used by the first thread, and vice versa.Thus, by constructing additional virtual cores
out of otherwise unused resources, we can achieve the performance of a multiple-core processor at
a fraction of the area and implementation cost. But this approach also creates challenges, as the