Mixed Model Universal
Software Thread-Level Speculation
Zhen Cao and Clark Verbrugge
School of Computer Science, McGill University
Montr
´
eal, Qu
´
ebec, Canada H3A 0E9
Email: zhen.cao@mail.mcgill.ca, clump@cs.mcgill.ca
Abstract—Software approaches to Thread-Level Speculation
(TLS) have been recently explored, bypassing the need for
specialized hardware designs. These approaches, however, tend
to focus on source or VM-level implementations aimed at spe-
cific language and runtime environments. In addition, previous
software approaches tend to make use of a simple thread forking
model, reducing their ability to extract substantial parallelism
from tree-form recursion programs such as depth-first search and
divide-and-conquer. This paper proposes a Mixed forking model
Universal software-TLS (MUTLS) system to overcome these
limitations. MUTLS is purely based on the LLVM intermediate
representation (IR), a language and architecture independent IR
that supports more than 10 source languages and target archi-
tectures by many projects. MUTLS maximizes parallel coverage
by applying a mixed forking model that allows all threads to
speculate, forming a tree of threads. We evaluate MUTLS using
several C/C++ and Fortran benchmarks on a 64-core machine. On
3 computation intensive applications we achieve speedups of 30 to
50 and 20 to 50 for the C and Fortran versions, respectively. We
also observe speedups of 2 to 7 for memory intensive applications.
Our experiments indicate that a mixed model is preferable for
parallelization of tree-form recursion applications over the simple
forking models used by previous software-TLS approaches. Our
work also demonstrates that actual speedup is achievable on
existing, commodity multi-core processors while maintaining the
flexibility of a highly generic implementation context.
Keywords—Thread-Level Speculation; Parallelization; Forking
Model
I. INTRODUCTION
Thread-level speculation (TLS), or speculative multithread-
ing (SpMT) is a safety-guaranteed approach to automatic or
implicit parallelization. Speculative threads are optimistically
launched at fork points, executing a code sequence from join
points well ahead of their parent thread. Safety is preserved
in this speculative model by buffering reads and writes of
the speculative thread. Once the parent thread reaches the
join point the latter may be joined, committing speculative
writes to main memory and merging its execution state into
the parent thread, provided no read conflicts have occurred.
In the presence of conflicts the speculative child execution is
discarded or rolled back for re-execution by the parent.
Thread-level speculation has received significant attention
in terms of hardware development as a feasible technique for
automatic parallelization [4], [17], [16]. Software-only designs
have been proposed, and have the advantage of applying to
existing, commodity multiprocessors, but the immediacy of
application requires some tradeoff in terms of increased over-
head and compilation complexity, with existing research efforts
based on prototype, language-specific implementations [12],
[10]. Realistic and convincing evaluation of such designs, how-
ever, requires consideration of a full compiler infrastructure,
one that enables both deep investigation and application to a
variety of compilation contexts.
Fundamentally, TLS approaches differ in terms of forking
models: how they create and manage speculative threads.
Two main forking models exist, in-order, and out-of-order,
and existing software models have been primarily based on
one or the other of these strategies, which allow for good
exploitation of parallelism in loops and deep method calls
respectively, as discussed in section II. These simple forking
models, however, have limitations with respect to the ability
to extract parallelism, and a reliance on pure in-order or pure
out-of-order design limits the amount of parallelism that can
be found in more complex programs, including ones that make
extensive use of tree-form recursion, such as found in depth-
first search and divide-and-conquer programs.
In this work we propose the Mixed-model Universal soft-
ware-TLS (MUTLS) system to overcome both limitations of
existing software-TLS approaches. First, MUTLS uses a mixed
forking model to maximize the potential to extract parallelism
in more general classes of programs. Second, MUTLS is
universal in that it is language and architecture neutral. Our
approach is to build a pure software TLS design using the
popular LLVM compiler framework [1]. We integrate our de-
sign into LLVM’s machine and language-agnostic intermediate
representation (IR), enabling generic application of TLS to
arbitrary input and output contexts. This has the advantage of
providing a full and non-trivial compiler context for evaluating
TLS, as well as allowing the full range of source and hardware
pairings enabled by the LLVM framework.
Our design is demonstrated and tested by modifying front-
ends for C/C++ and Fortran to support user-driven speculation.
From this we are able to generate native executables (or
JIT-based execution) for non-trivial benchmarks to evaluate
performance, illustrating the potential of our approach as a
means to explore and compare the use of TLS in different lan-
guage contexts. Software TLS faces significant challenges in
terms of balancing overhead concerns with the many possible
design decisions possible in TLS implementation. Our system
simplifies this research exploration by allowing for practical
experimentation within an optimizing compiler context. This
paper has the following specific contributions.
• We describe MUTLS, the first software-TLS implementa-
tion on a source language and target architecture independent