0
25
50
75
100
0 300 600 900 1200
MBW [%]
Time (s)
0
25
50
75
100
MEM [%]
0
25
50
75
100
CPU [%]
Host A
Host B
Figure 1: Resource utilization of Traditional
DRM
0
25
50
75
100
0 300 600 900 1200
MBW [%]
Time (s)
0
25
50
75
100
MEM [%]
0
25
50
75
100
CPU [%]
Host A
Host B
Figure 2: Resource utilization of Traditional
DRM + MBW-awareness
0
0.2
0.4
0.6
0.8
1
1.2
vm01-STREAM
vm02-STREAM
vm03-STREAM
vm04-STREAM
vm05-STREAM
vm06-STREAM
vm07-STREAM
vm08-gromacs
vm09-gromacs
vm10-gromacs
vm11-gromacs
vm12-gromacs
vm13-gromacs
vm14-gromacs
HM
IPC
Traditional DRM Architecture-aware DRM
49.2%
Figure 3: IPC Performance (HM is harmonic
mean.)
that do not contend for the same shared resource are mapped
to the same socket [12, 45, 60]. Our focus, in this work, is
not on a single server, but on a cluster of servers. We ex-
plore VM migration across nodes, which is complementary
to migrating applications/VMs across sockets.
2.2 Limitations of Traditional DRM Schemes
To address the VM-to-Host mapping challenge, prior
works [23, 27–31, 34, 56, 72] have proposed to manage
the physical resources by monitoring operating-system-level
metrics (such as CPU utilization, memory capacity demand)
and appropriately mapping VMs to hosts such that the uti-
lization of CPU/memory resources is balanced across differ-
ent hosts. While these schemes have been shown to be effec-
tive at CPU/memory resource scheduling and load balanc-
ing, they have a fundamental limitation – they are not aware
of the microarchitecture-level shared resource interference.
2.2.1 Lack of Microarchitecture-level Shared Resource
Interference Awareness
Prior works, including commercial products, base migration
decisions on operating-system-level-metrics. However, such
metrics cannot capture the microarchitecture-level shared re-
source interference characteristics. Our real workload pro-
filing results (detailed in Section 6.1) show that there are
many workloads, e.g., STREAM and gromacs, that exhibit
similar CPU utilization and demand for memory capacity,
but have very different memory bandwidth consumption.
Thus, when VMs exhibit similar CPU and memory capac-
ity utilization and the host is not overcommitted (i.e., CPU
or memory is under-utilized), traditional DRM schemes that
are unaware of microarchitecture-level shared resource inter-
ference characteristics would not recognize a problem and
would let the current VM-to-host mapping continue. How-
ever, the physical host might, in reality, be experiencing
heavy contention at the microarchitecture-level shared re-
sources such as shared cache and main memory.
2.2.2 Offline Profiling to Characterize Interference
Some previous works [31, 37, 75] seek to mitigate inter-
ference between applications/VMs at the microarchitecture-
level shared resources by defining constraints based on of-
fline profiling of applications/VMs, such that applications
that contend with each other are not co-located. For instance,
in VMware DRS [31], rules can be defined for VM-to-VM
or VM-to-Host mappings. While such an approach based on
offline profiling could work in some scenarios, there are two
major drawbacks to such an approach. First, it might not
always be feasible to profile applications. For instance, in
a cloud service such as Amazon EC2 [2] where VMs are
leased to any user, it is not feasible to profile applications
offline. Second, even when workloads can be profiled of-
fline, due to workload phase changes and changing inputs,
the interference characteristics might be different compared
to when the offline profiling was performed. Hence, such an
offline profiling approach has limited applicability.
2.3 The Impact of Interference Unawareness
In this section, we demonstrate the shortcomings of DRM
schemes that are unaware of microarchitecture-level shared
resource interference with case studies. We pick two ap-
plications: gromacs from the SPEC CPU2006 benchmark
suite [6] and STREAM [7]. STREAM and gromacs have
very similar memory capacity demand, while having very
different memory bandwidth usage: STREAM has high
bandwidth demand, gromacs has low (more workload pairs
that have such characteristics can be found in Section 6.1).
We run seven copies (VMs) of STREAM on Host A and
seven copies (VMs) of gromacs on Host B (initially). Both
of the hosts are SuperMicro servers equipped with two Intel
Xeon L5630 processors running at 2.13GHz (detailed in
Section 5). Each VM is configured to have 1 vCPU and 2
GB memory.
Figure 1 shows the CPU utilization (CPU), total mem-
ory capacity demand of VMs over host memory capacity
(memory capacity utilization - MEM), and memory band-
width utilization (MBW) of the hosts when a traditional
DRM scheme, which relies on CPU utilization and mem-
ory capacity demand, is employed. We see that although the
memory bandwidth on Host A is heavily contended (close
to achieving the practically possible peak bandwidth [21]),
the traditional DRM scheme does nothing (i.e., does not mi-
grate VMs) since the CPU and memory capacity on Host A
and Host B are under-utilized and Host A and Host B have
similar CPU and memory capacity demands for all VMs.
Figure 2 shows the same information for the same two
hosts, Host A and Host B. However, we use a memory-
bandwidth-contention-aware DRM scheme to migrate three
VMs that consume the most memory bandwidth from Host
A to Host B at 300 seconds, 600 seconds and 900 seconds.
To keep the CPU resources from being oversubscribed, we
also migrate three VMs that have low memory bandwidth
requirements from Host B to Host A. We see that after the
three migrations, the memory bandwidth usage on Host A
3