Page 3 of 14
the US Department of Defense provides services for
integrating a wide variety of simulator implementations,
including space and/or time parallel (conservative,
optimistic) discrete event simulations, and time-stepped
continuous simulations. However, the architecture has
been designed for interoperation of coarse integration
entities, such as distributed programs communicating
over the network. As such, it is not optimized for
integration of fine-grained entities, as in the hosting of
multiple event-oriented logical processes and/or threads
within a single UNIX process. In particular, primitives
to facilitate efficient process scheduling are not
addressed in the standard; such primitives turn out to be
the key to efficient execution of fine-grained
autonomous entities.
The work more closely related to our present subject
is by Jha and Bagrodia[13] in which a unified framework
is presented to permit optimistic and conservative
protocols to interoperate and alternate dynamically. (A
variation of Jha and Bagrodia’s protocols is later
discussed in [14], but in the context of VLSI
applications). High-level algorithms are presented in
[13] that elegantly state the problem along with their
solution approach. However, they do not address
implementation details or performance data. Their
treatment provides proof of correctness, but lacks an
implementation approach and a study of runtime
performance implications
†
. Our work differs in that we
are interested in defining the interface in a way that
guarantees efficient implementation, and we describe
details for a high-performance implementation of such a
unified interface. Some of our terms share their
definitions with analogous terms in their work, but our
interface uses fewer primitives and diverges in semantics
for others. For example, our interface does not require
the equivalent of their Earliest Output Time (EOT).
Similarly, in contrast to their need for lookahead, we do
not require that the application always specify a non-
zero lookahead.
A variety of parallel/distributed software systems are
available to support distributed conservative execution.
However, very few software systems exist that support
distributed optimistic simulation. Even fewer operational
systems (almost none that we are aware of) are available
for switching between conservative and optimistic
modes at either at compile time or runtime.
SPEEDES[15] is a commercial optimistic simulation
framework that is capable of distributed execution;
however, it has not been shown to be suitable for high-
performance execution of fine-grained applications. In
fact, some evidence exists that indicates that its runtime
and memory performance are not optimized for fine-
†
It is commonly acknowledged that, in high-performance
parallel/distributed execution, “the devil is in fact in the
details”.
grained distributed applications. GTW[7] and ROSS[16]
are representative of high-performance implementations
of optimistic simulators, but they are restricted to
parallel execution on symmetric shared memory
multiprocessor (SMP) platforms. This constraint limits
the user’s choice of hardware as well as scalability. An
exception is the WARPED simulator[17], a shared-
memory time warp system extended to execute on
distributed memory platforms, but it has been evaluated
on relatively small hardware configurations. We are
interested in scalable execution on large-scale computing
platforms, such as large clusters (hundreds) of quad-
processor SMP machines typically available in
supercomputing installations for academic research. The
cluster-of-SMPs platform is more appealing since it is
relatively less expensive as compared to a comparable
SMP system for large number of processors.
We note that, while the possibility of switching
between types of protocol is not new, our parsimonious
API and our high-performance implementation
approach are novel.
3. Micro-Kernel Concepts
In the micro-kernel view, simulation processes
‡
are
fully autonomous entities. Simulation processes
communicate by sending and receiving events to/from
other processes. They are free to determine for
themselves when and in what internal order they would
process their received events.
The micro-kernel does not process events in and by
itself – it only acts as a router of events. In particular, it
does not generate, consume or buffer any events. It
does not examine event contents, except for the event’s
header (source, destination and timestamp
§
). The micro-
kernel does not distinguish between regular events,
retraction events, anti-events or multicast events. It also
does not perform event buffer management (memory
reuse, fossil collection, etc.), in contrast to traditional
parallel/distributed simulation engines. The distinctions
among event types and their associated optimizations are
deferred to protocol-specific functionality of services
outside the kernel proper. The responsibility of a micro-
kernel is restricted to only providing services to the
simulation processes such that the processes can
efficiently communicate with each other, and collectively
accomplish “asymptotic” time-ordered processing of
events.
‡
Traditional PDES literature refers each distinct
communicating entity in a simulation as a “logical
process”. We use the terms “logical process” and
“simulation process” interchangeably.
§
The timestamp of an event (also called its “receive
timestamp”) is the simulation time at which its receiver
processes it.