Background Optimization in Full System Binary
Translation
Roman A. Sokolov
MCST CJSC
Moscow, Russia
Email: roman.a.sokolov@gmail.com
Alexander V. Ermolovich
Intel CJSC
Moscow, Russia
Email: karbo@pvk13.org
Abstract—Binary translation and dynamic optimization are
widely used to provide compatibility between legacy and promis-
ing upcoming architectures on the level of executable binary
codes. Dynamic optimization is one of the key contributors to
dynamic binary translation system performance. At the same
time it can be a major source of overhead, both in terms of
CPU cycles and whole system latency, as long as optimization
time is included in the execution time of the application under
translation. One of the solutions that allow to eliminate dynamic
optimization overhead is to perform optimization simultaneously
with the execution, in a separate thread. In the paper we present
implementation of this technique in full system dynamic binary
translator. For this purpose, an infrastructure for multithreaded
execution was implemented in binary translation system. This
allowed running dynamic optimization in a separate thread
independently of and concurrently with the main thread of
execution of binary codes under translation. Depending on the
computational resources available, this is achieved whether by
interleaving the two threads on a single processor core or by
moving optimization thread to an underutilized processor core.
In the first case the latency introduced to the system by a
computational intensive dynamic optimization is reduced. In the
second case overlapping of execution and optimization threads
also results in elimination of optimization time from the total
execution time of original binary codes.
I. INTRODUCTION
Technologies of binary translation and dynamic optimiza-
tion are widely used in modern software and hardware com-
puting systems [1]. In particular, dynamic binary translation
systems (DBTS) comprising the two serve as a solution to
provide compatibility between widely used legacy and promis-
ing upcoming architectures on the level of executable binary
codes. In the context of binary translation these architectures
are usually referred to as source and target, correspondingly.
DBTSs execute binary codes of source architecture on
top of instruction set (ISA) incompatible target architecture
hardware. They perform translation of executable codes incre-
mentally (as opposed to whole application static compilation)
interleaving it with execution of generated translated codes.
One of the key requirements that every DBTS has to meet
is that the performance of execution of source codes through
binary translation is to be comparable or even outperform the
performance of native execution (when executing them on top
of source architecture hardware).
Optimizing translator is usually employed to achieve higher
DBTS performance. It allows to generate highly efficient target
architecture codes fully utilizing all architectural features
introduced to support binary translation. Besides, dynamic
optimization can benefit from utilization of actual information
about executables behavior which static compilers usually
don’t possess.
At the same time dynamic optimization can imply sig-
nificant overhead as long as optimization time is included
in the execution time of application under translation. Total
optimization time can be significant but will not necessarily
be compensated by the translated codes speed-up if application
run time is too short.
Also, the operation of optimizing translator can worsen the
latency (i.e., increase pause time) of interactive application or
operating system under translation. By latency is meant the
time of response of computer system to external events such
as asynchronous hardware interrupts from attached I/O devices
and interfaces. This characteristic of a computer system is as
important for the end user, operation of hardware attached or
other computers across network as its overall performance.
Full system dynamic binary translators have to provide low
latency of operation as well. Binary translation systems of
this class target to implement all the semantics and behavior
model of source architecture and execute the entire hierar-
chy of system-level and application-level software including
BIOS and operating systems. They exclusively control all the
computer system hardware and operation. Throughout this
paper we will also refer this type of binary translation systems
as virtual machine level (or VM-level) binary translators (as
opposed to application-level binary translators).
One recognized technique to reduce dynamic optimization
overhead is to perform optimization simultaneously (con-
currently) with the execution of original binary codes by
utilizing unemployed computational resources or free cycles.
It was utilized in a number of dynamic binary translation and
optimization systems [2], [3], [4], [5], [6], [7], [8]. We will
refer this method as background optimization (as opposed to
consequent optimization, when optimizing translation inter-
rupts execution and utilizes processor time exclusively unless
it completes).
The paper describes implementation of background opti-
mization in a VM-level dynamic binary translation system.
This is achieved by separating of optimizing translation from
execution flow into an independent thread which can then con-