Hardware-in-the-loop Simulation for CPU/GPU
Heterogeneous Platforms
1
Youngsub Ko,
1
Taeyoung Kim,
2
Youngmin Yi,
3
Myungsun Kim,
1
Soonhoi Ha
1,3
School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea,
3
DMC R&D Samsung Electronics, Suwon, Korea
1
{kys4464, tykim, sha}@iris.snu.ac.kr,
3
mskim@redwood.snu.ac.kr
2
School of Electrical and Computer Engineering, University of Seoul, Seoul, Korea
2
ymyi@uos.ac.kr
ABSTRACT
Multi-core CPU/GPU heterogeneous platforms became popular
in embedded systems. A full system simulator is typically used to
observe the internal system behavior by running complete
software stacks without modification on simulation models of
CPUs and other devices in the system. However, there are few
known full system simulators for CPU/GPU heterogeneous
platforms and existent GPU simulators are prohibitively slow for
running application software. In this paper, we propose a
hardware-in-the-loop simulation technique that integrates GPU
hardware into a full system simulator. A novel interfacing
mechanism between CPU simulator and the development board,
where GPU hardware is integrated, is devised. In the experiments,
we took Exynos 4412 as a case study, where gem5 simulator is
used to simulate mainly a quad-core ARM CPU in the platform
and an Exynos development board is used to run the Mali GPU
hardware. We could successfully run Android apps on the
proposed hardware-in-the-loop simulation framework with up to
1.5 M cycles per second performance.
Keywords
HIL Simulation, CPU/GPU Heterogeneous platform, Mali GPU
1. INTRODUCTION
With ever increasing demand for computation in the embedded
systems, a mobile GPU has become an essential component in
most embedded systems. We can easily find many SoCs that
integrate both a CPU and a GPU: Tegra from NVIDIA,
Snapdragon from Qualcomm, and Exynos from Samsung, to name
a few. These chips are widely used on many platforms ranging
from automobiles to high-performance smart phones and tablet
PCs. Since low power consumption is the major design constraint
in most computer systems these days, the trend towards
CPU/GPU heterogeneous platforms will continue, also with the
increasing number of cores in CPUs and GPUs.
For architectural exploration of the system, as well as for
debugging and performance monitoring, a full system simulator is
typically used, on which complete software stacks can run without
modification. A full system simulator consists of simulation
models of CPUs, memories and a communication network as well
as peripherals. While CPU architectures have been studied for a
long time and there are many simulators available for different
CPUs, there are only a few GPU simulators whose simulation
speed is prohibitively slow for running application software;
gpgpu-sim [1] and Barra [2] are NVIDIA GPU simulators and the
simulation speed is only dozens of kilo-cycles per second. They
get even slower as the number of cores in a GPU increases. To the
best of our knowledge, there is no publicly available simulator for
widely used mobile GPUs such as Mali from ARM, PowerVR
from Imagination Technology, and Adreno from Qualcomm.
To make the full system simulation feasible for CPU/GPU
heterogeneous architectures, we propose a hardware-in-the-loop
simulation (HIL) technique that integrates existent GPU hardware
into a full system simulator. There are several challenges in
enabling hardware-in-the-loop simulation with CPU simulators
and the existent GPU hardware development board, among which
we list three major challenges. First, unlike a simulator or a
typical FPGA emulator for HW IPs, we cannot stop the execution
of GPU hardware and resume it conveniently. How to integrate a
development board into a CPU simulator is a challenging problem.
Second, since we model a system that has on-chip memory shared
by the CPU and the GPU, with the separate CPU simulator and
the GPU board, we must synchronize the duplicated shared
memory models, and maintain the coherence. Third, the simulator
must coordinate with the real GPU hardware carefully to preserve
functional correctness. For example, interrupts between
processors must be correctly modeled without violating the
causality or introducing any deadlock between the models.
To overcome these challenges, we devised a novel interfacing
mechanism between a CPU simulator and a development board
where the GPU hardware is integrated. To the best of our
knowledge, this is the first hardware-in-the-loop simulation
framework for a CPU/GPU heterogeneous embedded system that
can run complete software stacks without modification. The
proposed technique does not require the modification of GPU
drivers as well as the Linux kernel or the Android. The interaction
between the CPU simulator and the GPU hardware is done at the
calls to the GPU drivers, which can be detected easily without any
instrumentation. This allows easy porting of the simulation
interfaces for different GPUs, and also enables more efficient
synchronization between the simulator and the board.
It provides instruction-level accuracy for computation workload
so that the simulation can run fast enough to explore design space
of the system with the existent GPU hardware. For instance, we
can vary the number of CPU cores or change the types of GPUs,
evaluating the performance impact fast. On the other hand,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions
from Permissions@acm.org.
DAC '14, June 01 - 05 2014, San Francisco, CA, USA
Copyright is held by the owner/author(s).
Publication rights licensed to ACM. ACM 978-1-4503-2730-
5/14/06$15.00.
http://dx.doi.org/10.1145/2593069.2593149