SOSP ’17, October 28, 2017, Shanghai, China Jin Tack Lim, Christoer Dall, Shih-Wei Li, Jason Nieh, and Marc Zyngier
3 PARAVIRTUALIZATION FOR
ARCHITECTURE EVALUATION
Unfortunately, ARMv8.3 hardware is not available, and the
newest publicly available ARM hardware is still v8.0. As
architectural support for virtualization is increasingly com-
mon, understanding the performance of these features is
important, ideally before they become set in production hard-
ware. However, evaluating new architecture features for vir-
tualization is challenging because of costs associated with
prototyping new hardware and the need to understand the
interaction of both hardware and software. Chip vendors use
cycle-accurate simulators to measure performance, but they
are typically many orders of magnitude slower than real hard-
ware, making it hard to evaluate real-life workloads. Booting
a full virtualization stack including the hypervisor and VM
can take days, and even then, measuring key application per-
formance characteristics such as fast I/O performance using
10G Ethernet is still not possible. Furthermore, simulators of
commercial architecture designs are themselves quite com-
plex to build and often closed and proprietary, limiting their
availability in practice. Software developers often can only
use simpler architecture models before hardware is available,
at the cost of not being able to measure any real architecture
performance.
To overcome this challenge, we introduce an existing idea,
paravirtualization, in a new context. Paravirtualization al-
lows for a software interface to a VM that diers slightly
from the underlying hardware [
46
]. It is used to make hy-
pervisors simpler and faster by avoiding certain architecture
features that are complex or dicult to virtualize eciently.
We instead use paravirtualization to allow us to build hy-
pervisors using new architecture features that do not exist
on current hardware, and measure the performance of a full
virtualization stack using new architecture features at native
execution speeds on existing hardware.
Paravirtualization to evaluate new architecture features is
only possible when the performance and functionality of the
proposed feature can be closely emulated using instructions
supported by available hardware. For core virtualization sup-
port in the architecture, changes often involve traps; either
by adding features to trap on instructions that previously
did not trap, or by adding logic to avoid costly traps. In
both cases, paravirtualization can be used to replace instruc-
tions inside the VM with other ones supported by available
hardware such that the resulting behavior and performance
closely mimic that of a proposed architectural change.
For example, as discussed in Section 2, current ARM server
hardware does not support nested virtualization, because
when a hypervisor runs inside a VM on top of another hy-
pervisor, various instructions that it executes do not trap to
the underlying hypervisor for proper execution, but instead
simply fail improperly. However, if we replace those hyper-
visor instructions with instructions that do trap on current
hardware and the trap cost is expected to remain similar in
future hardware, we can obtain similar relative performance
to future hardware that supports nested virtualization with
correct trapping behavior.
There are a couple key assumptions in this example. First,
the approach is useful for evaluating the relative performance
of an architecture feature compared to something else, not
to estimate absolute performance of future hardware. For
example, the approach can provide an accurate evaluation
of the overhead of nested virtualization compared to native
execution.
Second, the approach assumes that certain types of traps
are interchangeable in terms of performance. For example, on
ARM, the trap cost using an explicit trap instruction should
be similar to the cost of any system register access instruction
that traps. Only the cost of the trap itself needs to remain
similar; the overall cost of handling the respective trap can
be quite dierent. This assumption is likely to be true in
most cases and we have validated it on ARM hardware, as
discussed in Section 5.
Using this approach, it becomes possible to eciently eval-
uate the performance of full virtualization stacks interacting
with fast I/O peripherals, using many CPU cores, and with
real-world workloads. It avoids the extremely slow perfor-
mance, complexity, and limited availability of cycle-accurate
simulators for recent architecture versions of commercial
CPUs. Perhaps more importantly, the approach allows co-
design and rapid prototyping of software and architecture
together, reducing long feedback loops common today when
the performance of full software stacks is not known until
full OS support and hardware is released, which is long after
the architecture design phase takes place.
4 KVM/ARM NESTED VIRTUALIZATION
FOR ARMV8.3
Because ARMv8.3 hardware is not yet available, we leverage
our paravirtualization approach discussed in Section 3 to
allow us to design, implement, and evaluate the rst ARM
hypervisor to support nested virtualization using ARMv8.3
architectural support on existing ARMv8.0 hardware. Since
both ARM and x86 provide a single level of architectural
virtualization support, we take an approach similar to Tur-
tles [
10
] for supporting nested virtualization on x86, where
multiple levels of virtualization are multiplexed onto the sin-
gle level of architectural support available. We have imple-
mented nested virtualization support on ARM by modifying
KVM/ARM [
18
], the widely-used mainline Linux ARM hy-
pervisor. There are two kinds of modications: (1) changes
to KVM/ARM as a host hypervisor to support running guest
204